SilasBarta comments on The Absent-Minded Driver - Less Wrong

27 Post author: Wei_Dai 16 September 2009 12:51AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (139)

You are viewing a single comment's thread. Show more comments above.

Comment author: SilasBarta 17 September 2009 08:59:38PM 3 points [-]

By the way, I went ahead and calculated what effect probabilistic knowledge of one's current intersection has on the payoff, so as to know the value of this knowledge. So I calculated the expected return, given that you have a probability r (with 0.5<r<=1) of correctly guessing what intersection you're in, and choose the optimal path based on whichever is most likely.

In the original problem the max payoff (under p=2/3) is 4/3. I found that to beat that, you only need r to be greater than 52.05%, barely better than chance. Alternately, that's only 0.0012 bits of the 1 bit of information contained by the knowledge of which intersection you're at! (Remember that if you have less than .0012 bits, you can just revert to the p=2/3 method from the original problem, which is better than trying to use your knowledge.)

Proof: At X, you have probability r of continuing, then at Y you have a probability r of exiting and (1-r) of continuing.

Thus, EU(r) = r(4r + (1-r) ) = r(3r+1).

Then solve for when EU(r)=4/3, the optimum in the fully ignorant case, which is at r = 0.5205.

Comment author: Wei_Dai 17 September 2009 10:30:06PM *  11 points [-]

You made a mistake here, which is assuming that when you guess you are at X, you should choose CONTINUE with probability 1, and when you guess you are at Y, you should choose EXIT with probability 1. In fact you can improve your expected payoff using a mixed strategy, in which case you can always do better when you have more information.

Here's the math. Suppose when you are at an intersection, you get a clue that reads either 'X' or 'Y'. This clue is determined by a dice roll at START. With probability .49, you get 'X' at both intersections. With probability .49, you get 'Y' at both intersections. With probability .02, you get 'X' at the X intersection, and 'Y' at the Y intersection.

Now, at START, your decision consists of a pair <p,q> of probabilities, where p is your probability to CONTINUE after seeing 'X', and q is your probability to CONTINUE after seeing 'Y'. Your expected payoff is:

.02 * (p*q + 4*(p*(1-q))) + .49 * (p*p + 4*(p*(1-p))) + .49 * (q*q + 4*(q*(1-q)))

which is maximized at p=0.680556, q=0.652778. And your expected payoff is 1.33389 which is > 4/3.

Comment author: SilasBarta 17 September 2009 11:11:22PM *  0 points [-]

Wow, good catch! (In any case, I had realized that if you have probability less than 52.05%, you shouldn't go with the most likely, but rather, revert the original p=2/3 method at the very least.)

The formula you gave for the mixed strategy (with coefficients .02, .49, .49) corresponds to a 51% probability of guessing right at any given light. (If the probability of guessing right is r, the coefficients should be 2r-1,1-r,1-r.) It actually raises the threshold for which choosing based on the most probable, with no other randomness, becomes the better strategy, but not by much -- just to about 52.1%, by my calculations.

So that means the threshold is 0.0013 bits instead of 0.0012 :-P

(Yeah, I did guess and check because I couldn't think of a better way on this computer.)

Comment author: Wei_Dai 17 September 2009 11:20:54PM *  4 points [-]

I think you might still be confused, but the nature of your confusion isn't quite clear to me. Are you saying that if r>52.1%, the best strategy is a pure one again? That's not true. See this calculation with coefficients .2, .4, .4.

ETA: Also, I think talking about this r, which is supposed to be a guess of "being at X", is unnecessarily confusing, because how that probability should be computed from a given problem statement, and whether it's meaningful at all, are under dispute. I suggest thinking in terms of what you should plan to do when you are at START.

Comment author: SilasBarta 18 September 2009 12:55:06AM *  0 points [-]

I think you might still be confused, but the nature of your confusion isn't quite clear to me. Are you saying that if r>52.1%, the best strategy is a pure one again? That's not true. See this calculation with coefficients .2, .4, .4.

That yields a payoff of 1.42, which is less than what the pure strategy gives in the equivalent case corresponding to .2/.4/.4, which is a 60% chance of guessing right. Since the payoff is r*(3r+1), the situation you described has a payoff of 1.68 under a pure strategy of choosing based on your best guess.

ETA: Also, I think talking about this r, which is supposed to be a guess of "being at X", is unnecessarily confusing, because how that probability should be computed from a given problem statement, and whether it's meaningful at all, are under dispute.

I specifically avoided defining r as the probability of "being at X"; r is the probability of guessing correctly (and therefore of picking the best option as if it were true), whichever signal you're at, and it's equivalent to choosing 2r-1,r-1,r-1 as the coefficients in your phrasing. The only thing possibly counterintuitive is that your ignorance maximizes at r=0.5 rather than zero. Less than 50%, and you just flip your prediction.

Comment author: Wei_Dai 18 September 2009 01:08:31AM 2 points [-]

Since the payoff is r*(3r+1), the situation you described has a payoff of 1.68 under a pure strategy of choosing based on your best guess.

No, it doesn't. This is what I meant by "r" being confusing. Given .2/.4/.4, if you always pick CONTINUE when you see hint 'X' and EXIT when you see hint 'Y', your expected payoff (computed at START) is actually:

.4 * 0 + .4 * 1 + .2 * 4 = 1.2.

Comment author: SilasBarta 18 September 2009 01:22:04AM 0 points [-]

Incorrect. Given .2/.4/.4, you will see X 60% of the time at X, and Y 60% of the time at Y. So your payoff, computed at START, is:

.4 * 0 + .6 * (4 *.6 + .4* 1) = 1.68

You seem to be treating .2/.4/.4 as being continue-exit/exit-exit/continue-continue, which isn't the right way to look at it.

Comment author: Wei_Dai 18 September 2009 01:41:59AM 2 points [-]

Please go back to what I wrote before (I've changed the numbers to .2/.4/.4 below):

Suppose when you are at an intersection, you get a clue that reads either 'X' or 'Y'. This clue is determined by a dice roll at START. With probability .4, you get 'X' at both intersections. With probability .4, you get 'Y' at both intersections. With probability .2, you get 'X' at the X intersection, and 'Y' at the Y intersection.

I'll go over the payoff calculation in detail, but if you're still confused after this, perhaps we should take it to private messages to avoid cluttering up the comments.

Your proposed strategy is to CONTINUE upon seeing the hint 'X' and EXIT upon seeing the hint 'Y'. With .4 probability, you'll get 'Y' at both intersections, but you EXIT upon seeing the first 'Y' so you get 0 payoff in that case. With .4 probability, you get 'X' at both intersections, so you choose CONTINUE both times and end up at C for payoff 1. With .2 probability, you get 'X' at X and 'Y' at Y, so you choose CONTINUE and then EXIT for a payoff of 4, thus:

.4 * 0 + .4 * 1 + .2 * 4 = 1.2

Comment author: SilasBarta 18 September 2009 04:52:40AM *  1 point [-]

I'm not confused; I probably should have stopped you at your original derivation for the partial-knowledge case but didn't want to check your algebra. And setting up problems like these is important and tricky, so this discussion belongs here.

So, I think the problem with your setup is that you don't make the outcome space fully symmetric because you don't have an equal chance of drawing Y at X and X at Y (compared to your chance of drawing X at X and Y at Y).

To formalize it for the general case of partial knowledge, plus probabilistic knowledge given action, we need to look at four possibilities: Drawing XY, XX, YY, and YX, only the first of which is correct. If, as I defined it before, the probability of being right at any given exit is r, the corresponding probabilities are: r^2, r(1-r), r(1-r), and (1-r)(1-r).

So then I have the expected payoff as a function of p, q, and r as:

(r^2)(p*q + 4p*(1 - q)) + r(1 - r)(p^2 + 4p(1 - p)) +r(1 - r)(q^2 + 4q(1 - q)) + (1 - r)(1 - r)(p*q + 4q(1 - p))

This nicely explains the previous results:

The original problem is the case of complete ignorance, r=1/2, which has a maximum 4/3 where p and q are such that they average out to choosing "continue" at one's current intersection 2/3 of the time. (And this, I think, shows you how to correctly answer while explicitly and correctly representing your probability of being at a given intersection.)

The case of (always) continuing on (guessing) X and not continuing on (guessing) Y corresponds to p=1 and q=0, which reduces to r*(3r+1), the equation I originally had.

Furthermore, it shows how to beat the payoff of 4/3 when your r is under 52%. For 51% (the original case you looked at), the max payoff is 1.361 {p=1, q=0.306388} (don't know how to show it on Alpha, since you have to constrain p and q to 0 thru 1).

Also, it shows I was in error about the 52% threshold, and the mixed strategy actually dominates all the way up to about r = 61%, at which point of course p=1 and q has shrunk to zero (continue when you see X, exit when you see Y). This corresponds to 32 millibits, much higher than my earlier estimate of 1.3 millibits.

Interesting!

Comment author: loqi 25 September 2009 07:07:16AM *  0 points [-]

I'm pretty sure Wei Dai is correct. I'll try and explain it differently. Here's a rendering of the problem in some kind of pseudolisp:

(start (p q)
(0.4 "uninformative-x"
(p "continue"
(p "continue" 1)
(else "exit" 4))
(else "exit" 0))
(0.4 "uninformative-y"
(q "continue"
(q "continue" 1)
(else "exit" 4))
(else "exit" 0))
(0.2 "informative"
(p "continue"
(q "continue" 1)
(else "exit" 4))
(else "exit" 0)))

Now evaluate with the strategy under discussion, (start 1 0):

(0.4 "uninformative-x"
(1 "continue"
(1 "continue" 1)
(0 "exit" 4))
(0 "exit" 0))
(0.4 "uninformative-y"
(0 "continue"
(0 "continue" 1)
(1 "exit" 4))
(1 "exit" 0))
(0.2 "informative"
(1 "continue"
(0 "continue" 1)
(1 "exit" 4))
(0 "exit" 0))

Prune the zeros:

(0.4 "uninformative-x"
(1 "continue"
(1 "continue" 1)))
(0.4 "uninformative-y"
(1 "exit" 0))
(0.2 "informative"
(1 "continue"
(1 "exit" 4)))

Combine the linear paths:

(0.4 "uninformative-x/continue/continue" 1)
(0.4 "uninformative-y/exit" 0)
(0.2 "informative/continue/exit" 4)

You seem to be treating .2/.4/.4 as being continue-exit/exit-exit/continue-continue, which isn't the right way to look at it.

I'd be interested in seeing what you think is wrong with the above derivation, ideally in terms of the actual decision problem at hand. Remember, p and q are decision parameters. They parameterize an agent, not an expectation. When p and q are 0 or 1, the agent is essentially a function of type "Bool -> Bool". How could a stateless agent of that type implement a better strategy than limiting itself to those three options?

Comment author: SilasBarta 25 September 2009 12:26:21PM 0 points [-]

Again, what's wrong with that derivation is it leaves out the possibility of "disinformative", and therefore assumes more knowledge about your intersection than you can really have. (By zeroing the probability of "Y then X" it concentrates the probability mass in a way that decreases the entropy of the variable more than your knowledge can justify.)

In writing the world-program in a way that categorizes your guess as "informative", you're implicitly assuming some memory of what you drew before: "Okay, so now I'm on the second one, which shows the Y-card ..."

Now, can you explain what's wrong with my derivation?

Comment author: loqi 25 September 2009 06:00:00PM 1 point [-]

Again, what's wrong with that derivation is it leaves out the possibility of "disinformative"

By "disinformative", do you mean that intersection X has hint Y and vice versa? This is not possible in the scenario Wei Dai describes.

In writing the world-program in a way that categorizes your guess as "informative"

Ah, this seems to be a point of confusion. The world program does not categorize your guess, at all. The "informative" label in my derivation refers to the correctness of the provided hints. Whether or not the hints are both correct is a property of the world.

you're implicitly assuming some memory of what you drew before: "Okay, so now I'm on the second one, which shows the Y-card ..."

No, I am merely examining the possible paths from the outside. You seem to be confusing the world program with the agent. In the "informative/continue/exit" case, I am saying "okay, so now the driver is on the second one". This does not imply that the driver is aware of this fact.

Now, can you explain what's wrong with my derivation?

I think so. You're approaching the problem from a "first-person perspective", rather than using the given structure of the world, so you're throwing away conditional information under the guise of implementing a stateless agent. But the agent can still look at the entire problem ahead of time and make a decision incorporating this information without actually needing to remember what's happened once he begins.

At the first intersection, the state space of the world (not the agent) hasn't yet branched, so your approach gives the correct answer. At the second intersection, we (the authors of the strategy, not the agent) must update your "guess odds" conditional on having seen X at the first intersection.

Your tree was:

(0.4 "exit" 0)
(0.6 "continue"
(0.6 "exit" 4)
(0.4 "continue" 1))

The outer probabilities are correct, but the inner probabilities haven't been conditioned on seeing X at the first intersection. Two out of three times that the agent sees X at the first intersection, he will see X again at the second intersection. So, assuming the p=1 q=0 strategy, the statement "Given .2/.4/.4, you will see Y 60% of the time at Y" is false.