Let's consider the agent given in A model of UDT with a halting oracle. One will notice that that agent is not quite well defined because it doesn't tell us in what order we are supposed to consider actions in step 1. But surely that doesn't matter, right? Wrong.
Let's consider the prisoner dilemma with payment matrix given by
1: C | 1: D | |
2: C | (3, 3) | (5, 0) |
2: D | (0, 5) | (2, 2) |
and consider agent A which consider whether there is a proof that A()≠D before considering whether there is a proof that A()≠C and agent A' which do things in the opposite order. If A or A' is pitted against itself everything is well and mutual cooperation is the result of the game but what if A is pitted against A'? Then A break down and cry.
Let's call the utility functions of A U and the utility function of A' U' and consider a model of PA in which PA is inconsistent (such a model must exist if PA is consistent). In such a model we will have A()=D and A'()=C and so U()=5 and U'()=0. That means that A will not be able to prove that A()=D => U()=u for any u different from 5 and so either A will defect and A' will cooperate or A will break down and cry, but A' will not cooperate because it cannot prove A'()=C => U()=u' for any u' except possibly 0, so A will break down and cry. QED
More generally if M is a model of PA in which PA is inconsistent, an agent defined in this way will never be able to prove that A()=a => U()=u (where a is the first action considered in step 1) except possibly for u=u0 where u0 is the value of U() in M. That seems to create a huge problem for that approach to UDT.
If I understand what you're proposing correctly then that doesn't work either.
Suppose it is a theorem of PA that this algorithm return an error, then all theorems of the form A()=a => U()>=u are demonstrable in PA and so there is no action with the highest u and the algorithm return an error, because that is demonstrable in PA the algorithm must return an error by Löb's theorem.
You're right, and I don't know how to fix the problem without adding the equivalent of "playing chicken". But now I wonder why the "playing chicken" step needs to consider the possible actions one at a time. Instead suppose that using an appropriate oracle, the agent looks through all theorems of PA, and returns action a as soon as it sees a theorem of the form A()≠a, for any a. This was my original interpretation of cousin_it's post. Does this have the same problem, if two otherwise identical agents look through the theorems of PA in different orders?