In this article cousin_it describes an algorithm, using a Turing Oracle, which gives a working model of UDT. This algorithm seems to work well in the case of a utility function which takes natural numbers or rationals as values but, as I will explain, the algorithm doesn't actually work for utilities given by real numbers; I will also propose an algorithm which doesn't use an oracle and which seems to wield the correct answer given sufficient computing time.
One of the hidden assumptions in the initial post is that comparison of utilities is decidable (at least decidable with a Turing oracle): in step 3 of the algorithm you are supposed to return the action that corresponds to the highest utility found on step 2 of the algorithm, but if comparison of utilities isn't actually decidable you won't be able (in the general case) to do that and as it happens that equality of real numbers isn't decidable (even using an oracle). Also there are no canonical forms for real numbers like there are for natural or rational numbers so you can have a situation where the order relationship between different real numbers depends on the value of A().
A simple modification of the original algorithm can deal with this problem:
- Enumerate proofs of statements of the form A()=a ⇒ U()≥u, where u is a rational number in canonical form, for t time steps.
- Return the action that corresponds to the highest utility found on step 2.
Why does this works? Because if the formal system used is sound, when the formal system proves that A()=a ⇒ U()≥u where a is the action which ends up actually being taken then U() must actually end up being at least u.
So for example, in the case of Newcomb's problem if t is sufficiently high it will eventually be proven that A()=1 ⇒ U()≥1 000 000 and so, if the formal system used is sound, the utility which ends up being received by the agent will have to be at least 1 000 000 which is of course only possible if the agent one boxes so that's what it will end up doing.
Thanks, this looks mostly right to me. Non-oracle algorithms with time limits were actually the first formulation we came up with, and then the oracle setting was mostly motivated by trying to devise an algorithm that didn't have an arbitrary time limit. So it might be fruitful to try and remove the time limit from your idea too.
I wonder if there's some way to maximize utility "implicitly" without having to mention concrete values in the proofs, e.g. by trying to prove statements of the form "if one-boxing leads to some unknown utility U, then two-boxing leads to utility less than U". This exact form doesn't seem to work, but maybe something similar can be made to work? Or if it can't be made to work, maybe there's some nice impossibility result lurking around the corner? Thanks for taking the time to think about this.