Why isn't the following decision theory optimal?

Elliot_Olds

10 Why isn't the following decision theory optimal?

16th Apr 2015

2 min read

10

I've recently read the decision theory FAQ, as well as Eliezer's TDT paper. When reading the TDT paper, a simple decision procedure occurred to me which as far as I can tell gets the correct answer to every tricky decision problem I've seen. As discussed in the FAQ above, evidential decision theory get's the chewing gum problem wrong, causal decision theory gets Newcomb's problem wrong, and TDT gets counterfactual mugging wrong.

In the TDT paper, Eliezer postulates an agent named Gloria (page 29), who is defined as an agent who maximizes decision-determined problems. He describes how a CDT-agent named Reena would want to transform herself into Gloria. Eliezer writes

By Gloria’s nature, she always already has the decision-type causal agents wish they had, without need of precommitment.

Eliezer then later goes on the develop TDT, which is supposed to construct Gloria as a byproduct.

Gloria, as we have defined her, is defined only over completely decision-determined problems of which she has full knowledge. However, the agenda of this manuscript is to introduce a formal, general decision theory which reduces to Gloria as a special case.

Why can't we instead construct Gloria directly, using the idea of the thing that CDT agents wished they were? Obviously we can't just postulate a decision algorithm that we don't know how to execute, and then note that a CDT agent would wish they had that decision algorithm, and pretend we had solved the problem. We need to be able to describe the ideal decision algorithm to a level of detail that we could theoretically program into an AI.

Consider this decision algorithm, which I'll temporarily call Nameless Decision Theory (NDT) until I get feedback about whether it deserves a name: you should always make the decision that a CDT-agent would have wished he had pre-committed to, if he had previously known he'd be in his current situation and had the opportunity to precommit to a decision.

In effect, you are making an general precommittment to behave as if you made all specific precommitments that would ever be advantageous to you.

NDT is so simple, and Eliezer comes so close to stating it in his discussion of Gloria, that I assume there is some flaw with it that I'm not seeing. Perhaps NDT does not count as a "real"/"well defined" decision procedure, or can't be formalized for some reason? Even so, it does seem like it'd be possible to program an AI to behave in this way.

Can someone give an example of a decision problem for which this decision procedure fails? Or for which there are multiple possible precommitments that you would have wished you'd made and it's not clear which one is best?

EDIT: I now think this definition of NDT better captures what I was trying to express: You should always make the decision that a CDT-agent would have wished he had precommitted to, if he had previously considered the possibility of his current situation and had the opportunity to costlessly precommit to a decision.

Personal Blog

10

New Comment

Rendering 0/30 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 10:36 AM

Moderation Log

10 Why isn't the following decision theory optimal?

by Elliot_Olds

16th Apr 2015

2 min read

10

By Gloria’s nature, she always already has the decision-type causal agents wish they had, without need of precommitment.

Eliezer then later goes on the develop TDT, which is supposed to construct Gloria as a byproduct.

Gloria, as we have defined her, is defined only over completely decision-determined problems of which she has full knowledge. However, the agenda of this manuscript is to introduce a formal, general decision theory which reduces to Gloria as a special case.

In effect, you are making an general precommittment to behave as if you made all specific precommitments that would ever be advantageous to you.

Personal Blog

10

New Comment

Rendering 0/30 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 10:36 AM

Moderation Log

More from Elliot_Olds

Curated and popular this week

30Comments

Comment Permalink

So8res11y50

In the retro blackmail, CDT does not precommit to refusing even if it's given the opportunity to do so before the researcher gets its source code. This is because CDT believes that the researcher is predicting according to a causally disconnected copy of itself, and therefore it does not believe that its actions can affect the copy. (That is, if CDT knows it is going to be retro blackmailed, and considers this before the researcher gets access to its source code, then it still doesn't precommit.) The failure here is that CDT only reasons according to what it can causally affect, but in the real world decision algorithms also need to worry about what they can logically affect (For example, two agents created while spacelike separated should be able to cooperate on a Prisoner's Dilemma.)

Your attempted patch (pretend you made your precommitments earlier in time) only works when the neglected logical relationships stem from a causal event earlier in time. This is often but not always the case. For instance, if CDT thinks that its clone was causally copied from its own source code, then you can get the right answer by acting as CDT would have precommitted to act before the copying occurred. But two agents written in spacelike separation from each other might have decision algorithms that are logically correlated, despite there being no causal connection no matter how far back you go.

In order to get the right precommitments in those sorts of scenarios, you need to formalize some sort of notion of "things the decision algorithm's choice logically affects," and formalizing "logical effects" is basically the part of the problem that remains difficult :-)

Elliot_Olds11y00

In the retro blackmail, CDT does not precommit to refusing even if it's given the opportunity to do so before the researcher gets its source code.

To clarify: you mean that CDT doesn't precommit at time t=1 even if the researcher hasn't gotten the code representing CDT's state at time t=0 yet. The CDT doesn't think precommitting will help because it knows the code the researcher will get will be from before its precommitment. I agree that this is true, and a CDT won't want to precommit.

I guess my definition even after my clarification is ambiguous, as i... (read more)

See in context