I've recently read the decision theory FAQ, as well as Eliezer's TDT paper. When reading the TDT paper, a simple decision procedure occurred to me which as far as I can tell gets the correct answer to every tricky decision problem I've seen. As discussed in the FAQ above, evidential decision theory get's the chewing gum problem wrong, causal decision theory gets Newcomb's problem wrong, and TDT gets counterfactual mugging wrong.
In the TDT paper, Eliezer postulates an agent named Gloria (page 29), who is defined as an agent who maximizes decision-determined problems. He describes how a CDT-agent named Reena would want to transform herself into Gloria. Eliezer writes
By Gloria’s nature, she always already has the decision-type causal agents wish they had, without need of precommitment.
Eliezer then later goes on the develop TDT, which is supposed to construct Gloria as a byproduct.
Gloria, as we have defined her, is defined only over completely decision-determined problems of which she has full knowledge. However, the agenda of this manuscript is to introduce a formal, general decision theory which reduces to Gloria as a special case.
Why can't we instead construct Gloria directly, using the idea of the thing that CDT agents wished they were? Obviously we can't just postulate a decision algorithm that we don't know how to execute, and then note that a CDT agent would wish they had that decision algorithm, and pretend we had solved the problem. We need to be able to describe the ideal decision algorithm to a level of detail that we could theoretically program into an AI.
Consider this decision algorithm, which I'll temporarily call Nameless Decision Theory (NDT) until I get feedback about whether it deserves a name: you should always make the decision that a CDT-agent would have wished he had pre-committed to, if he had previously known he'd be in his current situation and had the opportunity to precommit to a decision.
In effect, you are making an general precommittment to behave as if you made all specific precommitments that would ever be advantageous to you.
NDT is so simple, and Eliezer comes so close to stating it in his discussion of Gloria, that I assume there is some flaw with it that I'm not seeing. Perhaps NDT does not count as a "real"/"well defined" decision procedure, or can't be formalized for some reason? Even so, it does seem like it'd be possible to program an AI to behave in this way.
Can someone give an example of a decision problem for which this decision procedure fails? Or for which there are multiple possible precommitments that you would have wished you'd made and it's not clear which one is best?
EDIT: I now think this definition of NDT better captures what I was trying to express: You should always make the decision that a CDT-agent would have wished he had precommitted to, if he had previously considered the possibility of his current situation and had the opportunity to costlessly precommit to a decision.
Nope! That's the open part of the problem :-) We don't know how to build a decision network with logical nodes, and we don't know how to propagate a "logical update" between nodes. (That is, we don't have a good formalism of how changing one algorithm logically affects a related but non-identical algorithm.)
If we had the latter thing, we wouldn't even need the "logical decision network", because we could just ask "if I change the agent, how does that logically affect the universe?" (as both are algorithms); this idea is the basis of proof-based UDT (which tries to answer the problem by searching for proofs under the assumption "Agent()=a" for various actions). Proof based UDT has lots of problems of its own, though, and thinking about logical updates in logical graphs is a fine angle of approach.
Thanks. I had one question about your Toward Idealized Decision Theory paper.
I can't say I fully understand UDT, but the 'updateless' part does seem very similar to the "act as if you had precommitted to any action that you'd have wanted to precommit to" core idea of NDT. It's not clear to me that the super powerful UDT would make the wrong decision in the game where two players pick numbers between 0-10 and get payouts based on their pick and the total sum.
Wouldn't the UDT reason as follows? "If my algorithm were such that I wouldn't just ... (read more)