5 dollars is better than 10 dollars
The 5-10 Problem is a strange issue in which an agent reasoning about itself makes an obviously wrong choice.
Our agent faces a truly harrowing choice: it must decide between taking $5 (utility 5) or $10 (utility 10).
How will our agent solve this dilemna? First, it will spend some time looking for a proof that taking $5 is better than taking $10. If it can find one, it will take the $5. Otherwise, it will take the $10.
Fair enough, you think. Surely the agent will concede that it can't prove taking $5 is better than taking $10. Then, it will do the sensible thing and take the $10, right?
Wrong.
Our agent finds the following the following proof that taking $5 is better:
Let's go over the proof.
Line 1: Taking $5 gives you $5.
Line 2: If F is true, then ~F->x is true for any x.
Line 3: If you find a proof that taking $5 gives you $5 and take $10 gives you $0, you'd take the $5.
Line 4: Combine the three previous lines
Line 5: Löb's Theorem
Line 6: Knowing that taking $5 gives you $5 and taking $10 gives you $0, you happily take the $5.
Simplified Example
To understand what went wrong, we'll consider a simpler example. Suppose you have a choice between drinking coffee (utility 1) and killing yourself (utility -100).
You decide to use the following algorithm: "if I can prove that I will kill myself, then I'll kill myself. Otherwise, I'll drink coffee".
And because a proof that you'll kill yourself, implies that you'll kill yourself, by Lob's Theorem, you will kill yourself.
Here, it is easier to see what went wrong-proving that you'll kill yourself is not a good reason to kill yourself.
This is hidden in the original 5-10 problem. The first conditional is equivalent to "if I can prove I will take $5, then I'll take $5".
Hopefully, it's now more clear what went wrong. How can we fix it?
Solution?
I once saw a comment suggesting that the agent instead reason about how a similar agent would act (I can't find it anymore, sorry). However, this notion was not formalized. I propose the following formalization:
We construct an agent . Each time makes a decision, it increments an internal counter , giving each decision a unique identity. uses the following procedure to make decisions: for each action , it considers the agent . is a copy of (from when it was created), except that if would make a decision with id , it instead immediately takes action . Then, if can prove any of these agents has the maximum expected utility, it chooses the action corresponding to that agent.
How would f() map 10 to 0? Wouldn't that require that from
there's a proof of
?
My understanding is that in the original formulation, the agent takes it's own definition along with a description of the universe and looks for proofs of the form
But since "A()" is the same in both sides of the expression, one of the implications is guaranteed to be vacuously true. So the output of the program depends on the order in which it looks for proofs. But here f looks for theorems starting from different axioms depending on it's input, so "A()" and "U()" in f(5) can be different than "A()" and "U()" when f(10).