That's funny. What you described in the second paragraph is something like a 2-player bimatrix game played across time and space in which the players aren't even sure of their opponents' existence and in which our player's strategy is which decision theory he uses.
Very interesting, and great food for thought. But again, the complication comes from the possible existence of another player. I would argue that it's reasonable to assume ourselves some 'breathing room' of one to two million years before we have to deal with other players. Then in that case, why not build a 'naive' FAI which operates under the assumption that there is no other player, let it grow, and then when it has some free time, let it think of a decision theory for you? (I don't know whether you speak for the SIAI, cousin_it, but I think it would be fair for an outsider to wonder why Yudkowsky thinks this route in particular has the greatest cost/benefit in terms of achieving FAI as fast as possible.)
I'm not affiliated with SIAI in any way. Just like you, I'm an outsider trying to clear up these topics for my own satisfaction :-)
Many people here think that we must get FAI right on the first try, because after it gains power it will resist our attempts to change it. If you code into the AI the assumption that it's the only player, it won't believe in other players even when it sees them, and will keep allocating resources to building beautiful gardens even as alien ships are circling overhead (metaphorically speaking). When you ask it to build some guns...
I am posting this is because I'm interested in self-modifying agent decision theory but I'm too lazy to read up on existing posts. I want to see a concise justification as to why a sophisticated decision theory would be needed for the implementation of an AGI. So I'll present a 'naive' decision theory, and I want to know why it is unsatisfactory.
The one condition in the naive decision theory is that the decision-maker is the only agent in the universe who is capable of self-modification. This will probably suffice for production of the first Artificial General Intelligence (since humans aren't actually all that good at self-modification.)
Suppose that our AGI has a probability model for predicting the 'state of the universe in time T (e.g. T= 10 billion years)' conditional on what it knows, and conditional on one decision it has to make. This one decision is how should it rewrite its code at time zero. We suppose it can rewrite its code instantly, and the code is limited to X bytes. So the AGI has to maximize utility at time T over all programs with X bytes. Supposing it can simulate its utility at the 'end state of the universe' conditional on which program it chooses, why can't it just choose the program with the highest utility? Implicit in our set-up is that the program it chooses may (and very likely) will have the capacity to self-modify again, but we're assuming that our AGI's probability model accounts for when and how it is likely to self-modify. Difficulties with infinite recursion loops should be avoidable if our AGI backtracks from the end of time.
Of course our AGI will need a probability model for predicting what a program for its behavior will do without having to simulate or even completely specify the program. To me, that seems like the hard part. If this is possible, I don't see why it's necessary to develop a specific theory for dealing with convoluted Newcomb-like problems, since the above seems to take care of those issues automatically.