Comment Permalink

How would f() map 10 to 0? Wouldn't that require that from

A() := 10
U() := 
     if A() = 10
         return 10
     if A() = 5
         return 5

there's a proof of

U() = 0

My understanding is that in the original formulation, the agent takes it's own definition along with a description of the universe and looks for proofs of the form

[A() = 10 -> U() = x] & [A() = 5 -> U() = y ]

But since "A()" is the same in both sides of the expression, one of the implications is guaranteed to be vacuously true. So the output of the program depends on the order in which it looks for proofs. But here f looks for theorems starting from different axioms depending on it's input, so "A()" and "U()" in f(5) can be different than "A()" and "U()" when f(10).

See in context

18 Thoughts on the 5-10 Problem

by Tofly

18th Jul 2019

2 min read

18 5 dollars is better than 10 dollars

The 5-10 Problem is a strange issue in which an agent reasoning about itself makes an obviously wrong choice.

Our agent faces a truly harrowing choice: it must decide between taking $5 (utility 5) or $10 (utility 10).

How will our agent solve this dilemna? First, it will spend some time looking for a proof that taking $5 is better than taking $10. If it can find one, it will take the $5. Otherwise, it will take the $10.

Fair enough, you think. Surely the agent will concede that it can't prove taking $5 is better than taking $10. Then, it will do the sensible thing and take the $10, right?

Wrong.

Our agent finds the following the following proof that taking $5 is better:

F \to (U = 5) F \to (\neg F \to (U = 0)) □ ((F \to (U = 5)) \land (\neg F \to (U = 0))) \to F □ ((F \to (U = 5)) \land (\neg F \to (U = 0))) \to ((F \to (U = 5)) \land (\neg F \to (U = 0))) (F \to (U = 5)) \land (\neg F \to (U = 0)) F

Let's go over the proof.

Line 1: Taking $5 gives you $5.

Line 2: If F is true, then ~F->x is true for any x.

Line 3: If you find a proof that taking $5 gives you $5 and take $10 gives you $0, you'd take the $5.

Line 4: Combine the three previous lines

Line 5: Löb's Theorem

Line 6: Knowing that taking $5 gives you $5 and taking $10 gives you $0, you happily take the $5.

Simplified Example

To understand what went wrong, we'll consider a simpler example. Suppose you have a choice between drinking coffee (utility 1) and killing yourself (utility -100).

You decide to use the following algorithm: "if I can prove that I will kill myself, then I'll kill myself. Otherwise, I'll drink coffee".

And because a proof that you'll kill yourself, implies that you'll kill yourself, by Lob's Theorem, you will kill yourself.

Here, it is easier to see what went wrong-proving that you'll kill yourself is not a good reason to kill yourself.

This is hidden in the original 5-10 problem. The first conditional is equivalent to "if I can prove I will take $5, then I'll take $5".

Hopefully, it's now more clear what went wrong. How can we fix it?

Solution?

I once saw a comment suggesting that the agent instead reason about how a similar agent would act (I can't find it anymore, sorry). However, this notion was not formalized. I propose the following formalization:

We construct an agent $A$ . Each time $A$ makes a decision, it increments an internal counter $n$ , giving each decision a unique identity. $A$ uses the following procedure to make decisions: for each action $a$ , it considers the agent $A_{a, n}$ . $A_{a, n}$ is a copy of $A$ (from when it was created), except that if $A_{a, n}$ would make a decision with id $n$ , it instead immediately takes action $a$ . Then, if $A$ can prove any of these agents has the maximum expected utility, it chooses the action corresponding to that agent.

Decision theoryRobust Agents

Frontpage

18

Thoughts on the 5-10 Problem

New Comment

11 comments, sorted by

top scoring

Click to highlight new comments since: Today at 12:36 AM

[-]Dagon6y60

I think I missed the indirection required to use Lob's theorem (which I thought was about not being able to prove that a statement is unprovable, not about proving false things - that is, for our formal systems, we accept incompleteness but not incorrectness).

Mainly I don't see where you have the actual proposition setup in your "proof" - the knowledge that you're choosing between $5 and $10, not between $5 and "something else, which we are allowed to assume is $0". If you don't know (or your program ignores) that you will be offered $10 iff you reject the $5, you'll OF COURSE take the $5. You can test this in real humans: go offer someone $5 and don't say anything else. If they turn it down, congratulate them on their decision-making and give them $10. If they take the $5, tell them they fell victim to the 5-10 problem and laugh like a donkey.

FactorialCode's comment _is_ an application that shows the problem - it's clearly "among things I can prove, pick the best", not "evaluate the decision of passing up the $5". I'd argue that it's _ALSO_ missing the important part of the setup - you can prove that taking $10 gives you $10 as easily as that $5 gives you $5, and the "look for proof" is handwaved (better than absent) without showing why the proof.

[-]Liam Donovan6y10

My (possibly very incorrect) takeaway from the post, as someone with very little background in mathematical logic, was that "If I can prove x has higher utility than y, then I will do x" (statement 1) is a bad heuristic for an EDT agent that can reason about its own source code, because outputting x will be a fixed point of this decision process* even when this does not return higher utility. Specifically, an EDT agent will choose action x iff the utility of choosing x is higher than that of choosing y (assuming the utilities are different). Thus, assuming statement 1 is equivalent to assuming "if I can prove x has higher utility than y, x has higher utility than y" (statement 2) for the EDT agent. Because assuming statement 2 leads to absurd conclusions (like the agent taking the 5 dollar bill), assuming it is a bad heuristic.

This use of Lob's theorem seems to do exactly what you want: show that we can't prove a statement is unprovable. If we prove a statement of the form "if a is provable then a is true" , then the contrapositive "if a is not true then it is not provable" follows. However, I thought the point of the post is that we can't actually prove a statement of this form, namely the statement "if x does not have higher utility than y, then I cannot prove that x has higher utility than y" (statement 3). Statement 3 is necessary for the heuristic in statement 1 to be useful, but the post shows that it is in fact false.

The point of the post isn't to prove something false, it's to show that we can't prove a statement is unprovable.

*I'm not sure if I'm these terms correctly and precisely :/

[-]Tofly6y10

Yes, you could make the code more robust by allowing the agent to act once its found a proof that any action is superior. Then, it might find a proof like

U(F) = 5

U(~F) = 10

10 > 5

U(~F) > U(F)

However, there's no guarantee that this will be the first proof it finds.

When I say "look for a proof", I mean something like "for each of the first 10^(10^100)) Godel numbers, see if it encodes a proof. If so, return that action.

In simple cases like the one above, it likely will find the correct proof first. However, as the universe gets more complicated (as our universe is), there is a greater chance that a spurious proof will be found first.

[-]Chris_Leong6y*30

I suggested a similar approach in deconfusing logical counterfactuals where we erase information about the agent so that we end up with multiple possible agents, though I wouldn't be surprised if other people have also tried asking their agent to ask about how other agents reason. Your approach is different in that the original agent isn't included among the set of agents considered and that seems like a useful adaption that I hadn't considered as long as we can provide appropriate justification. I also provide some more just

Anyway, it's good to see someone else thinking along a similar (but different) track and I'd be curious to hear what you think about my approach.

[-]FactorialCode6y20

If I can try and make your solution concrete for the original 5-10 problem. Would it look something like this?

A() := 
    let f(x) :=
         Take A() := x as an axiom instead of A() := this function
         Take U() := to be the U() in the original 5-10 problem
         Look for a proof of "U() = y"
         return y
    in 
         return argmax f(x) 
    where x in {5,10}

[-]Gurkenglas6y20

Edit: So the reason we don't get the 5-and-10 problem is that we don't get ☐(☐(A=5=>U=5 /\ A=10=>U=0) => (A=5=>U=5) /\ A=10=>U=0), because ☐ doesn't have A's source code as an axiom. Okay. (Seems like this solves the problem by reintroducing a cartesian barrier by which we can cleanly separate the decision process from all else.) (My own favorite solution A = argmax_a(max_u(☐(A=a=>U>=u))) also makes ☐ unable to predict the outcome of A's sourcecode, because ☐ doesn't know it won't find a proof for A=10=>U>=6.)

[-]FactorialCode6y20

How would f() map 10 to 0? Wouldn't that require that from

A() := 10
U() := 
     if A() = 10
         return 10
     if A() = 5
         return 5

there's a proof of

U() = 0

My understanding is that in the original formulation, the agent takes it's own definition along with a description of the universe and looks for proofs of the form

[A() = 10 -> U() = x] & [A() = 5 -> U() = y ]

[-]NothingnessAbove6y10

As far as I can tell, this problem is an exercise in logical uncertainty. Consider an example agent A which makes a decision between, say, options a and b, with possible outcomes U=0, U=5, and U=10. In general, the agent uses its logical uncertainty estimator to compare the expected utilities 0*P(U=0|a)+5*P(U=5|a)+10*P(U=10|a) to 0*P(U=0|b)+5*P(U=5|b)+10*P(U=10|b). Of course, this causes a divide by zero error if A is certain of which action it will take. To avoid this, if A ever proves that it will take an action, it will immediately take a different action, regardless of the expected utility assigned to that action. So, if A ever proves what action it takes in advance of taking it, it will be wrong, and unsound. Thus if A is sound it cannot prove in advance what action it will take. In the $5 and $10 game, A will correctly assess the expected values, and choose the $10 because it is higher. It will be able to correctly assess the expected value of the $5 because it will hold nonzero probability of taking the $5 before it makes its decision. Why will it hold this nonzero probability, when $10 is the obvious choice? Because, since by Lob's theorem A can't prove itself sound, and if it is unsound, it might prove in advance that it will take the $10, and therefore take the $5 instead.

I don't know if this was clear but this is not a full answer, because logical uncertainty is hard and I'm just assuming agent A is somehow good at it.

Edit: How does A calculate the expected utility of another agent being in its position, when it is nontrivially embedded in its environment? Of course, if the agent is not embedded, the 5-10 problem ceases to be an issue(AIXI is not bothered by it), for precisely this reason: it's easy to see what the counterfactual world where the agent decided to take another action looks like, rather than in this case where that counterfactual world might be logically impossible or ill-defined.

[-]Liam Donovan6y10

What was wrong with specifying an agent that uses "[decision theory] unless it's sure it'll make a decision, then it makes the opposite choice"?

[-]Tofly6y20

Doesn't that mean the agent never makes a decision?

[This comment is no longer endorsed by its author]Reply

[-]Suh Dude6y20

Not really. It means that the agent never mathematically proves that it will make a decision before it makes it. The agent makes decisions by comparing expected utilities on different actions, not by predicting what it will do.

Moderation Log