Does this "action payoff" actually measure anything meaningful?
Well yes, it does measure payoff averaged over decision-points instead of over trips. This would be relevant if instead of a single driver who wins the given amount upon reaching their destination, there were multiple identical selfish drivers drawn by lottery who make a decision at one visited intersection each (without knowing which one) and all receive winnings. Although the outcomes seem the same - you make a decision and get paid depending upon where the car ends up - the incentives are different.
Consider a road with 1000 intersections. Turning at the first pays $100, the second pays nothing, and the subsequent turns all pay $101. Going straight through all intersections pays nothing. Consider the two strategies of turning with 100% probability or 2% probability in each game.
In both game variants, the 100% turn always yields $100 for any driver.
In the original absent-minded driver game, turning 2% of the time is clearly a poor idea: there is a 2% chance of $100, 1.96% chance of nothing, and 96.04% chance of $101 for an average of $99.00.(There is a negligible chance of going straight through and getting nothing)
In the multiple driver game, any given driver can determine (based on the objective lottery odds) that there is a 2% chance that if selected, they will be placed at intersection 1, 1.96% at intersection 2, and otherwise they are somewhere between 3 and 700 (with negligible probability of greater). If at 1 their payout will average $99.00, if at 2 they average $98.98, otherwise they average $101.00. Their expected net winnings are then $100.92.
For the multiple driver game, the 2% turn strategy is clearly better than the 100% turn strategy and this can be calculated at the start. It does not depend upon self-locating probabilities, it is just considering a different type of scenario.
So in short, there is no paradox and nothing wrong with self-locating probabilities. An absent-minded driver who uses the "action optimal" strategy is simply misapplying it to the wrong type of scenario.
If I'm reading it correctly, the basis for 50/7 "action" expected value is that the driver might have previously switched strategies from the optimal one (p=0) to a poorer local maximum (p=1/2). Subsequently they may have driven through some unknown number of turns.
If this already happened, then continuing with p=1/2 is correct and the expected value truly is 50/7. This is greater than the expected value at the start because there are two lower-value states eliminated by "currently at an intersection" conditioning.
[Edit: Oops, forgot the main point I was going to make]
The problem is that if the driver carries out this reasoning at any intersection, by the amnesic hypothesis they should apply it at every intersection. In particular they will apply it at the first intersection, where it is a mistake and they know that in that situation it is a mistake.
Assuming with no evidence that you have already made a mistake seems like a poor starting assumption for any rational decision theory.
the basis for 50/7 "action" expected value is that the driver might have previously switched strategies from the optimal one (p=0) to a poorer local maximum (p=1/2).
I don't think that is the basis. p=1/2 as one of the action optimal is derived by finding a stable point of the action output function. The expected payoff is obtained by subbing p=1/2 into the action payoff. In this process, the planning optimal of p=0 was not part of the derivation. So it is not a "switch" of strategy per se. The fact that I may have already driven through some intersections is an inherent part of the problem (absentmindedness), any mixed strategy (CONTINUE with a random chance) would have to face that. Not special to action optimal like p=1/2.
Furthermore, if we are considering the action payoff function (i.e. the one using probabilities of "here is X/Y/Z") then p=1/2 is not a inferior local maximum. At the very least it is a better point than the planning optimal p=0. Also as long as he uses the action payoff function, the driver should indeed apply the same analysis at every intersection and arrives at p=1/2 independently. i.e. it is consistent with observation point 2: "The driver is aware he will make (has made) an identical decision at the other intersection too. "
I agree using p=1/2 is a mistake. As you have pointed out, it is especially obvious at the first intersection. My position is this mistake is due to the action payoff function being fallacious. Because it uses self-locating probability. As oppose to Aumman's explanation: that the driver could not coordinate on p=1/2, due to the absentmindedness they can only coordinate at the planning stage.
p=1/2 as one of the action optimal is derived by finding a stable point of the action output function. The expected payoff is obtained by subbing p=1/2 into the action payoff.
Yes, assuming that the "action payoff" were useful at all. The scenario is that the driver determined at the start, using their knowledge of where they are, that p=0 was optimal. This is a correct decision.
The "action optimal" reasoning assumes that the driver applies it at every intersection, which means that it's only worth considering under the assumption that the driver changed their mind from p=0 to p=1/2 some time before the first intersection. Even if the "action payoff" was a useful thing to maximize (it isn't), this would still be a very dubious assumption.
I agree using p=1/2 is a mistake. As you have pointed out, it is especially obvious at the first intersection. My position is this mistake is due to the action payoff function being fallacious. Because it uses self-locating probability.
Maximizing that quantity is a mistake whether or not self-locating probabilities are used. You can define the equivalent quantity for non-self-locating models and it isn't useful there either, for the same reasons.
> The "action optimal" reasoning assumes that the driver applies it at every intersection
This is a pretty obvious assumption that has been reflected by Aumman's observation 2. " The driver is aware he will make (has made) an identical decision at the other intersection too. " I do not see any reason to challenge that. But if I understand correctly, you do.
> which means that it's only worth considering under the assumption that the driver changed their mind from p=0 to p=1/2 sometime before the first intersection
I disagree that can be referred to as a change of mind. The derivation of action optimal is an independent process from the derivation of planning optimal. Maybe you mean the driver could only coordinate on the planning optimal due to the absentmindedness similar to Aumman's reasoning. But then again, you don't seem to agree with his observation point 2. So I am not entirely sure about your position.
If you say there is no reason for the driver to change his decision from the planning stage since there is no new information. Then we are making the same point. However, for the driver, the "no new information" argument applies not only to the first intersection but to any/all intersections. So again I am not sure why stressing on the first intersection. And then there is the problem of no new information, i.e. not changing decision, vs there are multiple action optimal points with higher payoffs, i.e. why not choosing them. Which I think lacks a compelling explanation.
> Maximizing that quantity is a mistake whether or not self-locating probabilities are used.
The p=1/2 is not found by maximizing the action utility function. It is derived by finding stable points/nash equilibriums. p=1/2 is one of them, the same as p=0. Among these stable points, p=1/2 has the highest expected utility. In comparison, the planning utility function does not use self-locating probability. And maximizing it gives the planning optimal, which is uncontroversially useful.
I disagree that can be referred to as a change of mind.
Before starting the drive, the driver determines that always turning at the first intersection will be optimal. I didn't think we disagreed on that.
The p=1/2 is not found by maximizing the action utility function.
Yes, it is. You can verify this by finding the explicit expression for action utility as a function of p (a rational function consisting of a fourth-order polynomial divided by a quadratic), and verifying that it has a maximum at p=1/2. The payoffs were clearly carefully chosen to ensure this.
> Before starting the drive, the driver determines that always turning at the first intersection will be optimal. I didn't think we disagreed on that.
But the driver does not have to do any calculation before starting the drive. He can do that, yes. He also can simply choose to only think about the decision when arrived at an intersection. It is possible for him to derive the "action optimals" chronologically before deriving the "planning optimal". As I said earlier, they are two independent processes.
>Yes, it is. You can verify this by finding the explicit expression for action utility as a function of p....
No, it was not found by maximizing the action utility function. In Aumann's process, the action utility function was not represented by a single variable p, but with multiple variables representing casually disconnected decisions (observation 1). Because the decisions ought to be the same (observations 2) the action optimal ought to be symmetrical Nash equilibriums or "stable points". You can see an example in Eliezer Yudkowsky's post. For this particular problem, there are three stable points for the action utility functions. p=0, p=7/30 and p=1/2. Among these three p=1/2 gives the highest action payoff, 7/30 the lowest.
I will take your words for it that p=1/2 also maximizes action utility. But that is just a coincidence for this particular problem. Not how action optimals are found per Aumann.
For the sake of clarity let's take a step back and examine our positions. Everyone agrees p=1/2 is not the right choice. Aumann thinks it is done through 2 steps.
1. Derive all action optimals using by finding the stable point of the action utility function. ( p=1/2 is one of them, as well as p=0)
2. p=1/2 is rejected because it is not possible for the driver at different intersections to coordinate on it due to absentmindedness.
I disagree with both points 1 and 2, reason being the action utility function is fallacious. Are you rejecting both, or point 2 only, or are you agreeing with him?
For the sake of clarity let's take a step back and examine our positions. Everyone agrees p=1/2 is not the right choice. Aumann thinks it is done through 2 steps.
1. Derive all action optimals using by finding the stable point of the action utility function. ( p=1/2 is one of them, as well as p=0)
2. p=1/2 is rejected because it is not possible for the driver at different intersections to coordinate on it due to absentmindedness.
I disagree with both points 1 and 2, reason being the action utility function is fallacious. Are you rejecting both, or point 2 only, or are you agreeing with him?
Point 1 is wrong. Action utility measures the wrong thing in this scenario, but does measure the correct thing for some superficially similar but actually different scenarios.
Point 2 is also wrong, because it's perfectly possible to be able to coordinate in this scenario. It's just that due to point 1 being wrong, they would be coordinating on the wrong strategy.
So we agree that both these points are incorrect, but we disagree on the reasons for them being incorrect.
OK, I think that is clearer now. I assume you think the strategy to coordinate on should be determined by maximizing the planning utility function. Not by maximizing the action utility function nor finding the stable point of the action utility function. I agree with all of this.
The difference is that you think the self-locating probabilities are valid. The action utility function that uses them is valid but can only be used in superficially similar problems such as multiple drivers being randomly assigned to intersections.
While I think self-locating probabilities are not valid, therefore the action utility functions are fallacious. Whereas in problems where multiple drivers are randomly assigned to intersections, the probability for someone assigned to an intersection is not self-locating probabilities.
Pretty close. I do think that self-locating probabilities can be valid, but determining the most relevant one to a given situation can be difficult. There are a lot more subtle opportunities for error than with more familiar externally supplied probabilities.
In particular, the way in which this choice of self-locating probability is used in this scenario does not suit the payoff schedule and incentives. Transforming it into related scenarios with non-self-locating probabilities is just one way to show that the problem exists.
Just for fun: an alternative treatment that eliminates indexical probabilities.
Instead of one absent-minded driver, we have some essentially identical people who are pretty good at decision theory problems, one per intersection. They reason the same way, all know that they reason the same way, etc. A random 1:1 mapping of people to intersections is chosen, and they are individually instructed to decide on a probability for whether the car should continue through the intersection or turn, should it make it there. They all receive the same utility payoff depending upon where the car ends up as in the original problems.
There are no philosophical questions here about whether it makes sense to assign probability to being a particular forgetful observer. It's just a straightforward game theory application to objective, external probabilities. However, the mathematical derivation is exactly the same as in the single absent-minded driver. They all know that whatever reasoning they use to assign a probability to continuing, by symmetry so will the others and it reduces to finding an optimal common p.
In this view it is clearer that the "action" utility function simply isn't a useful concept.
When I actually run the experiment many times, I find the following when using p=4/9:
The fraction of decisions made at X is indeed 1/(p+1), in this case 9/13.
Of all the times a choice is made at X, the average resulting payoff is 96/81 (as expected).
Of all the times a choice is made at Y, the average resulting payoff is 24/9 (as expected).
The overall average payoff is not the simple combination of probability at X times payoff at X plus probability at Y times payoff at Y.
The discrepancy is cause by the driver-at-Y having to "share" a portion of the expected payoff with the previous driver-at-X.
Thank you for the input. I think point 4 is the key problem.
> The overall average payoff is not the simple combination of probability at X times payoff at X plus probability at Y times payoff at Y.
This is hard to justify by usual probability calculations. Putting it into the context of the sleeping problem we get:
"The overall probability for Heads is not the simple combination of probability "today is Monday" times p(Heads|Monday) plus the probability "today is Tuesday" times p(Heads|Tuesday)."
This is typically denied by most. And the Achilles' heel of double-halving.
What I mean is that you can just calculate each of...
A: Fraction of decisions made at X
B: Average payoff of decisions made at X
C: Fraction of decisions made at Y
D: Average payoff of decisions made at Y
E: Average payoff of all runs (some of which have multiple decisions)
...and observe that AB + CD = E does not hold. The reason for this is that some payoffs are the result of multiple decisions.
The overall probability for Heads is not the simple combination of probability "today is Monday" times p(Heads|Monday) plus the probability "today is Tuesday" times p(Heads|Tuesday).
I'm not a double-halfer, so I don't have a problem denying this.
I think the interesting thing is what does AB+CD actually means? If we treat the fraction of decisions at X as the probability here is X, same for Y, then AB+CD should be the expected payoff of this decision. Typically the best decision should be derived by maximizing it. But clearly, that leads to wrong results such as 4/9. So what's wrong?
My position is that AB+CD is meaningless. It is a fallacious payoff because self-locating probabilities are invalid. This also resolves double-halfers problems. But let's leave that aside for the time being.
If I understand correctly, your position is maximizing AB+CD is not correct. Because when deciding we should be maximizing the payoff of runs instead of this decision. Here I just want to point out the payoff for runs (planning utility function) does not use self-locating probabilites. You didn't say if AB+CD is something meaningful or not.
Aumann thinks AB+CD is meaningful. However, maximizing it is wrong. He pointed out that the decision at X and at Y are causally disconnected, yet the two decisions ought to be the same. A symmetrical Nash equilibrium. So the correct decision is a stable point of AB+CD. The problem is when there are multiple stable points, which one is the optimal decision, the one with the highest AB+CD? Aumann says no. It should be the point that maximizes the planning payoff function (the function not using self-locating probabilities).
I am not convinced by this. First of all, it lacks compelling reason. The explanation is ad-hoc "due to the absentmindedness". Second, by his reasoning, AB+CD effectively plays no part in the decision-making process. The decision maximizing planning utility function is always going to be a stable point of AB+CD, and it is always going to be chosen regardless of what value it gives for AB+CD. So the whole argument about AB+CD being meaningful lacks substantial support.
Each of A,B,C,D are measurable, and so AB+CD is measurable as well, but since that combination isn't the expected value we want to be maximizing, I would certainly say it's not useful and at that point I wouldn't really care whether it's meaningful (if it does "mean" anything, it would be something convoluted and not really relevant to the decision).
The absent-minded driver problem is this:
Fig 1. (Taken from The Absent-Minded Driver by Robert J. Aumann, Sergiu Hart, and Motty Perry)
“An absent-minded driver starts driving at START in Figure 1. At X he can either EXIT and get to A (for a payoff of 0) or CONTINUE to Y. At Y he can either EXIT and get to B (payoff 4), or CONTINUE to C (payoff 1). The essential assumption is that he cannot distinguish between intersections X and Y, and cannot remember whether he has already gone through one of them.”
It can be easily shown the best strategy is to CONTINUE with a probability of 2/3 at an intersection. i.e. let p be the probability of CONTINUE at an intersection, the payoff would be 4p(1-p)+p2 which is maximized when p=2/3.
Yet, there is a paradox. Imagine you are the driver. When arrived at an intersection, you could assign a certain probability for “here is X”. Let this probability be α. Now the payoff function would be α[4(1-p)p+p2] +(1-α)[4(1-p)+p]. Maximizing it would cause p to be different from 2/3, no matter how one chooses to determine the value of α. Except of course if α=1. i.e. the probability of “here at X” is 100%, which is impossible to justify.
For example, a common approach is to let α=1/(1+p). The reason being the driver will always pass X with a probability of 1 and pass Y with a probability of p, ergo the probability of “here is X” should be the relative weight of the two. Using this value for α, the payoff is function is maximized when p=4/9. Different from 2/3.
Aumman’s Answer
Aumann, Hart, and Perry disagree with the above reasoning and deny the problem presents any paradox. First, they make a distinction between planning optimal and action optimal. The original p=2/3 is regarded as the planning optimal. The rational decision for the driver when arrived at an intersection is dubbed the action optimal. The action optimal is derived with the following observations:
I will skip the details (Eliezer Yudkowsky made a post about it here). Their method of determining the action optimal is essentially treating it as a game of coordination between “me at this intersection” and “me at the other intersection”. The result is finding a stable point of the payoff function, similar to a Nash equilibrium which indeed gives p=2/3. Furthermore, the planning optimal will always be an action optimal.
However, problems still exist when there are more than one action optimals. For example:
Figure 2. taken from the same source as before. In
For this problem, the planning optimal is p=0. Always EXIT at an intersection would give the highest payoff of 7. However, there are other stable points aka action optimals. Notably, p=1/2. In the planning stage, p=1/2 would give a payoff of 6.5: ( as 1/2*7+1/4*0+1/8*22+1/8*2). Yet in the action stage, its payoff changes to 50/7. (To see this, notice for p=1/2 the probability of that “Here is X/Y/Z” is 4/7; 2/7 and 1/7 respectively. So the payoff is 4/7*(1/2*7+1/4*0+1/8*22+1/8*2) + 2/7*(1/2*0+1/4*22+1/4*2) + 1/7*(1/2*22+1/2+2) = 50/7) In the action stage, the payoff for p=1 is still 7, the value remains unchanged because for an always EXIT strategy the probability of “here is X” is indeed 100%. So why shouldn’t the driver disregard the planning optimal and choose to CONTINUE with p=1/2 when arrived at an intersection, since it gives a higher expected payoff than always EXIT? (i.e. 50/7 > 7)
Aumman’s explanation is based on game theory. When there are multiple equilibrium points an actor cannot actively “choose” which one to take (observation point 1 reflects this). The equilibrium is determined by outside forces such as customs and traditions. In his belief, because of the absent-mindedness, the coordination between”the driver at this intersection” and “the driver at other intersections” can only be done in the planning stage. Which gives p=0, even though it gives a lower expected payoff at the action stage.
I find this explanation unconvincing. Even if a game of coordination is played between different persons, each person picking the stable point with the highest payoff is still a very plausible outcome. The given rationale for picking a lower payoff strategy at the action stage, i.e. “due to the absent-mindedness”, lacks compelling reason and seems ad-hoc. To me, the root of the problem is how can p=1/2 be regarded as the highest payoff strategy at all? Clearly, even if somehow the driver at the three intersections does coordinate on p=1/2, the average payoff would still only be 6.5. The payoff of 50/7 at the action stage could never be realized or varified. Furthermore, by Aumman’s reasoning, 50/7 being the highest payoff is never relevant to the decision anyway.
Another Problem Caused by Self-Locating Probability
I think this paradox is caused by accepting a probability for “here is X”. It is a self-locating probability. As I have argued in anthropic paradoxes, self-locating probabilities are not valid concepts. For an oversimplified summary, “here” is a location primitively understood from a particular perspective due to its immediacy to subjective experience. Because of its primitive nature, there is no way to define an inherent reference class nor assign any probability distribution.
This problem requires us to think from the specific perspective of the driver as he arrives at an intersection to comprehend “here”. So any notion of probability for “Here is X/Y/Z” is fallacious. As a result, the expected payoff calculation during the action stage (50/7) is misguided. There is only one legitimate payoff function (the one at the planning stage). The driver should keep his original strategy when arriving at an intersection simply because there is no new information.
An Embarrassment for Double Halfers?
The invalidity of self-locating beliefs also resolves the major problem for double halferism in the sleeping beauty problem: why not do a Bayesian update upon learning today is Monday/Tuesday. Because there is no probability for “today is Monday/Tuesday” in the first place.
Micheal Titelbaum formulated an example to demonstrate double halfers cannot reasonably keep their belief about a coin yet to be tossed at 1/2 (objective chance). This experiment, dubbed “an embarrassment for double halfers”, is highly similar to the original sleeping beauty problem. Here the original coin is tossed after the first awakening on Monday afternoon. A second, inconsequential coin toss is added on Tuesday afternoon. Now after waking up in the experiment, answer the question “what is the probability that today’s coin toss will land on heads?”.
Titelbaum argues “today’s coin landed Heads” includes two possibilities: “today is Monday and the original coin landed heads” or “today is Tuesday and the new coin landed heads”. So double-halfers can either keep their answer of the original coin at 1/2 which will make the yet-to-be-tossed probability bigger than half, or say today’s coin has a probability of 1/2 and make the probability for the original coin smaller. Unless of course, they believe the probability for “today is Monday” is 100%, which is impossible to justify.
This format is similar to the absent-minded driver problem. It can be resolved the same way by recognizing “today” as a primitive concept, invalidating the self-locating probability. Nonetheless, I think it is an excellent example. It shows the typical explanation for double-halving, i.e. inventing new rules of update for self-locating information is not going to work. As long as self-locating probabilities are regarded as valid, double-halfer’s answers would inevitably deviate from objective chance.