Wei_Dai comments on The Absent-Minded Driver - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (139)
I think you might still be confused, but the nature of your confusion isn't quite clear to me. Are you saying that if r>52.1%, the best strategy is a pure one again? That's not true. See this calculation with coefficients .2, .4, .4.
ETA: Also, I think talking about this r, which is supposed to be a guess of "being at X", is unnecessarily confusing, because how that probability should be computed from a given problem statement, and whether it's meaningful at all, are under dispute. I suggest thinking in terms of what you should plan to do when you are at START.
That yields a payoff of 1.42, which is less than what the pure strategy gives in the equivalent case corresponding to .2/.4/.4, which is a 60% chance of guessing right. Since the payoff is r*(3r+1), the situation you described has a payoff of 1.68 under a pure strategy of choosing based on your best guess.
I specifically avoided defining r as the probability of "being at X"; r is the probability of guessing correctly (and therefore of picking the best option as if it were true), whichever signal you're at, and it's equivalent to choosing 2r-1,r-1,r-1 as the coefficients in your phrasing. The only thing possibly counterintuitive is that your ignorance maximizes at r=0.5 rather than zero. Less than 50%, and you just flip your prediction.
No, it doesn't. This is what I meant by "r" being confusing. Given .2/.4/.4, if you always pick CONTINUE when you see hint 'X' and EXIT when you see hint 'Y', your expected payoff (computed at START) is actually:
.4 * 0 + .4 * 1 + .2 * 4 = 1.2.
Incorrect. Given .2/.4/.4, you will see X 60% of the time at X, and Y 60% of the time at Y. So your payoff, computed at START, is:
.4 * 0 + .6 * (4 *.6 + .4* 1) = 1.68
You seem to be treating .2/.4/.4 as being continue-exit/exit-exit/continue-continue, which isn't the right way to look at it.
Please go back to what I wrote before (I've changed the numbers to .2/.4/.4 below):
I'll go over the payoff calculation in detail, but if you're still confused after this, perhaps we should take it to private messages to avoid cluttering up the comments.
Your proposed strategy is to CONTINUE upon seeing the hint 'X' and EXIT upon seeing the hint 'Y'. With .4 probability, you'll get 'Y' at both intersections, but you EXIT upon seeing the first 'Y' so you get 0 payoff in that case. With .4 probability, you get 'X' at both intersections, so you choose CONTINUE both times and end up at C for payoff 1. With .2 probability, you get 'X' at X and 'Y' at Y, so you choose CONTINUE and then EXIT for a payoff of 4, thus:
.4 * 0 + .4 * 1 + .2 * 4 = 1.2
I'm not confused; I probably should have stopped you at your original derivation for the partial-knowledge case but didn't want to check your algebra. And setting up problems like these is important and tricky, so this discussion belongs here.
So, I think the problem with your setup is that you don't make the outcome space fully symmetric because you don't have an equal chance of drawing Y at X and X at Y (compared to your chance of drawing X at X and Y at Y).
To formalize it for the general case of partial knowledge, plus probabilistic knowledge given action, we need to look at four possibilities: Drawing XY, XX, YY, and YX, only the first of which is correct. If, as I defined it before, the probability of being right at any given exit is r, the corresponding probabilities are: r^2, r(1-r), r(1-r), and (1-r)(1-r).
So then I have the expected payoff as a function of p, q, and r as:
(r^2)(p*q + 4p*(1 - q)) + r(1 - r)(p^2 + 4p(1 - p)) +r(1 - r)(q^2 + 4q(1 - q)) + (1 - r)(1 - r)(p*q + 4q(1 - p))
This nicely explains the previous results:
The original problem is the case of complete ignorance, r=1/2, which has a maximum 4/3 where p and q are such that they average out to choosing "continue" at one's current intersection 2/3 of the time. (And this, I think, shows you how to correctly answer while explicitly and correctly representing your probability of being at a given intersection.)
The case of (always) continuing on (guessing) X and not continuing on (guessing) Y corresponds to p=1 and q=0, which reduces to r*(3r+1), the equation I originally had.
Furthermore, it shows how to beat the payoff of 4/3 when your r is under 52%. For 51% (the original case you looked at), the max payoff is 1.361 {p=1, q=0.306388} (don't know how to show it on Alpha, since you have to constrain p and q to 0 thru 1).
Also, it shows I was in error about the 52% threshold, and the mixed strategy actually dominates all the way up to about r = 61%, at which point of course p=1 and q has shrunk to zero (continue when you see X, exit when you see Y). This corresponds to 32 millibits, much higher than my earlier estimate of 1.3 millibits.
Interesting!
I'm pretty sure Wei Dai is correct. I'll try and explain it differently. Here's a rendering of the problem in some kind of pseudolisp:
Now evaluate with the strategy under discussion, (start 1 0):
Prune the zeros:
Combine the linear paths:
I'd be interested in seeing what you think is wrong with the above derivation, ideally in terms of the actual decision problem at hand. Remember, p and q are decision parameters. They parameterize an agent, not an expectation. When p and q are 0 or 1, the agent is essentially a function of type "Bool -> Bool". How could a stateless agent of that type implement a better strategy than limiting itself to those three options?
Again, what's wrong with that derivation is it leaves out the possibility of "disinformative", and therefore assumes more knowledge about your intersection than you can really have. (By zeroing the probability of "Y then X" it concentrates the probability mass in a way that decreases the entropy of the variable more than your knowledge can justify.)
In writing the world-program in a way that categorizes your guess as "informative", you're implicitly assuming some memory of what you drew before: "Okay, so now I'm on the second one, which shows the Y-card ..."
Now, can you explain what's wrong with my derivation?
By "disinformative", do you mean that intersection X has hint Y and vice versa? This is not possible in the scenario Wei Dai describes.
Ah, this seems to be a point of confusion. The world program does not categorize your guess, at all. The "informative" label in my derivation refers to the correctness of the provided hints. Whether or not the hints are both correct is a property of the world.
No, I am merely examining the possible paths from the outside. You seem to be confusing the world program with the agent. In the "informative/continue/exit" case, I am saying "okay, so now the driver is on the second one". This does not imply that the driver is aware of this fact.
I think so. You're approaching the problem from a "first-person perspective", rather than using the given structure of the world, so you're throwing away conditional information under the guise of implementing a stateless agent. But the agent can still look at the entire problem ahead of time and make a decision incorporating this information without actually needing to remember what's happened once he begins.
At the first intersection, the state space of the world (not the agent) hasn't yet branched, so your approach gives the correct answer. At the second intersection, we (the authors of the strategy, not the agent) must update your "guess odds" conditional on having seen X at the first intersection.
Your tree was:
The outer probabilities are correct, but the inner probabilities haven't been conditioned on seeing X at the first intersection. Two out of three times that the agent sees X at the first intersection, he will see X again at the second intersection. So, assuming the p=1 q=0 strategy, the statement "Given .2/.4/.4, you will see Y 60% of the time at Y" is false.
Okay, this is where I think the misunderstanding is. When I posited the variable r, I posited it to mean the probability of correctly guessing the intersection. In other words, you receive information at that point such that it moves your estimate of which intersection you're at, accounting for other inferences you may have made about the problem, including from examining it from the outside and setting your p, to r. So the way the r is defined, it screens off knowledge gained from deciding to use p and q.
Now, this might not be a particularly relevant generalization of the problem, I now grant that. But under the premises, it's correct. A better generalization would be to find out your probability distribution across X and Y (given your choice of p), and then assume someone gives you b bits of evidence (decrease in the KL Divergence of your estimate from the true distribution), and find the best strategy from there.
And for that matter Wei_Dai's solution, given his way of incorporating partial knowledge of one's intersection, is also correct, and also probably not the best way to generalize the problem because it basically asks, "what strategy should you pick, given that you have a probably t of not being an absent-minded driver, and a probability 1 - t of being an absent-minded driver?"
Thanks, this clarifies the state of the discussion. I was basically arguing against the assertion that it was not.
I don't think I understand this. The resulting agent is always stateless, so it is always an absent-minded driver.
Are you looking for a way of incorporating information "on-the-fly" that the original strategy couldn't account for? I could be missing something, but I don't see how this is possible. In order for some hint H to function as useful information, you need to have estimates for P(H|X) and P(H|Y) ahead of time. But with these estimates on hand, you'll have already incorporated them into your strategy. Therefore, your reaction to the observation of H or the lack thereof is already determined. And since the agent is stateless, the observation can't affect anything beyond that decision.
It seems that there is just "no room" for additional information to enter this problem except from the outside.