I am posting this is because I'm interested in self-modifying agent decision theory but I'm too lazy to read up on existing posts. I want to see a concise justification as to why a sophisticated decision theory would be needed for the implementation of an AGI. So I'll present a 'naive' decision theory, and I want to know why it is unsatisfactory.
The one condition in the naive decision theory is that the decision-maker is the only agent in the universe who is capable of self-modification. This will probably suffice for production of the first Artificial General Intelligence (since humans aren't actually all that good at self-modification.)
Suppose that our AGI has a probability model for predicting the 'state of the universe in time T (e.g. T= 10 billion years)' conditional on what it knows, and conditional on one decision it has to make. This one decision is how should it rewrite its code at time zero. We suppose it can rewrite its code instantly, and the code is limited to X bytes. So the AGI has to maximize utility at time T over all programs with X bytes. Supposing it can simulate its utility at the 'end state of the universe' conditional on which program it chooses, why can't it just choose the program with the highest utility? Implicit in our set-up is that the program it chooses may (and very likely) will have the capacity to self-modify again, but we're assuming that our AGI's probability model accounts for when and how it is likely to self-modify. Difficulties with infinite recursion loops should be avoidable if our AGI backtracks from the end of time.
Of course our AGI will need a probability model for predicting what a program for its behavior will do without having to simulate or even completely specify the program. To me, that seems like the hard part. If this is possible, I don't see why it's necessary to develop a specific theory for dealing with convoluted Newcomb-like problems, since the above seems to take care of those issues automatically.
Your idea is a little similar to one-action AI that I described sometime ago. It's a neat way to get goal stability (if you trust your magical math intuition module), but IMO it doesn't solve all of decision theory. The tricky question is how to build the "probability model", or assignment of expected utilities to all possible actions. A poorly chosen assignment can end up being mathematically true by making itself true through influencing your actions, like a self-fulfilling prophecy.
For example, imagine that an alien AI created a million years ago predicted that humanity will build an AI based on your decision theory, and precommitted to waging self-destructive war against us unless we give it 99% of our resources. Your AI knows everything about physics, so it will infer the existence of the alien AI at "time zero" and immediately give up the resources. But this decision of your AI was itself predicted by the alien AI like I predicted it here, and that's why the alien AI made its precommitment in the first place. TDT tries to solve this problem by not giving in to extortion, though we don't know how to formalize that.
For a more interesting twist, consider that our universe is likely to contain many instances of us (e.g. if it's spatially infinite, which is currently considered feasible), scattered all over spacetime. Will the different copies of your AI cooperate with each other, or will they do something stupid like wage war? UDT tries to solve this problem and others like it by accounting for logical copies implicitly so they all end up cooperating.
UDT can solve this problem by noticing that a decision to not give in to extortion makes the extortion improbable. TDT won't be able to notice the scenario where the aliens never appear, and so won't solve this problem for the same reason it doesn't solve Counterfactual Mugging. (Does this mean that TDT doesn't solve Newcomb's problem with transparent boxes? I don't remember hearing that, although I remember Drescher mentioning that CM is analogous ... (read more)