Thanks for posting. Your analysis is an improvement over the LW conventional wisdom, but you still doesn't get it right, where right, to me, means the way it is analyzed by the guys who won all those Nobel prizes in economics. You write:
First, let's note that there definitely are possible cases where it would be "beneficial to be irrational".
But in every example you supply, what you really want is not exactly to be irrational; rather it is to be believed irrational by the other player in the game. But you don't notice this because in each of your artificial examples, the other player is effectively omniscient, so the only way to be believed irrational is to actually be irrational. But then, once the other player really believes, his strategies and actions are modified in such a way the your expected behavior (which would have been irrational if the other player had not come to believe you irrational) is now no longer irrational!
But, better yet, lets Taboo the word irrational. What you really want him to believe is that you will play some particular strategy. If he does, in fact, believe, then he will choose a particular strategy, and your own best response is t...
But in every example you supply, what you really want is not exactly to be irrational; rather it is to be believed irrational by the other player in the game.
I don't think that's the real problem: after all, Parfit's Hitchhiker and Newcomb's problem also eliminate this distinction by positing an Omega that will not be wrong in its predictions.
The real problem is that Chappell has delineated a failure mode that we don't care about. TDT/UDT are optimized for situations in which the world only cares about what you would do, not why you decide to do so. In Chappell's example's, there's no corresponding action that forms the basis of the failure; the "ritual of cognition" alone determines your punishment.
The EY article he linked to ("Newcomb's Problem and the Regret of Rationality") makes the irrelevance of these cases very clear:
...Next, let's turn to the charge that Omega favors irrationalists. I can conceive of a superbeing who rewards only people born with a particular gene, regardless of their choices. I can conceive of a superbeing who rewards people whose brains inscribe the particular algorithm of "Describe your options in English and choose the l
Here's another way of looking at the situation that may or may not be helpful. Suppose I ask you, right here and now, what you'd do in the hypothetical future Parfit's Hitchhiker scenario if your opponent was a regular human with Internet access. You have several options:
Answer truthfully that you'd pay $100, thus proving that you don't subscribe to CDT or EDT. (This is the alternative I would choose.)
Answer that you'd refuse to pay. Now you've created evidence on the Internet, and if/when you face the scenario in real life, the driver will Google your name, check the comments on LW and leave you in the desert to die. (Assume the least convenient possible world where you can't change or delete your answer once it's posted.)
Answer that you'd pay up, but secretly plan to refuse. This means you'd be lying to us here in the comments - surely not a very nice thing to do. But if you subscribe to CDT with respect to utterances as well as actions, this is the alternative you're forced to choose. (Which may or may not make you uneasy about CDT.)
Wei-Dai wrote a post entitled The Absent-Minded Driver which I labeled "snarky". Moreover, I suggested that the snarkiness was so bad as to be nauseating, so as to drive reasonable people to flee in horror from LW and SAIA. I here attempt to defend these rather startling opinions. Here is what Wei-Dai wrote that offended me:
This post examines an attempt by professional decision theorists to treat an example of time inconsistency, and asks why they failed to reach the solution (i.e., TDT/UDT) that this community has more or less converged upon. (Another aim is to introduce this example, which some of us may not be familiar with.) Before I begin, I should note that I don't think "people are crazy, the world is mad" (as Eliezer puts it) is a good explanation. Maybe people are crazy, but unless we can understand how and why people are crazy (or to put it more diplomatically, "make mistakes"), how can we know that we're not being crazy in the same way or making the same kind of mistakes?
The paper that Wei-Dai reviews is "The Absent-Minded Driver" by Robert J. Aumann, Sergiu Hart, and Motty Perry. Wei-Dai points out, rather condescendingly...
There are a few essential questions here:
My claim is purely theoretical: we need to distinguish, conceptually, between desirable dispositions and rational actions. It seems to me that many on LW fail to make this conceptual distinction, which can lead to mistaken (or at least under-argued) theorizing about rationality
This is because actions only ever arise from dispositions. Yes, given that Omega has predicted you will one-box, it would (as an abstract fact) be to your benefit to two-box; but in order for you to actually two-box, you would have to execute some instruction in your source code, which, if it were present, Omega would have read, and thus would not have predicted that you would one-box.
Hence only dispositions are of interest.
It was good to have the disposition to ignore threats
But not as good as the disposition to ignore threats, except when the threats are caused by transparently accidental mental glitches (which would not be encouraged by the disposition).
Eliezer's theory is more-or-less causal decision theory with a different account of dependency hypotheses/counterfactuals. The most relevant philosophical disputes would be about whether to use "local miracle" counterfactuals rather than various backtracking counterfactuals, or logical/mathematical counterfactuals (Eliezer's timeless decision theory idea).
"Due to an unexpected mental glitch, he threatens Joe again. Joe follows his disposition and ignores the threat. BOOM. Here Joe's final decision seems as disastrously foolish as Tom's slip up."
But of course, the initial decision to take the pill may be rational, and the "final decision" is constrained so much that we might regard it as a "decision" in name only. The way I see it: When Joe takes the pill, he will stop rational versions of Tom from threatening him, meaning he benefits, but will be at increased risk of irration...
Sort of a side note to the main topic of discussion but being as my post was quoted, maybe worth responding:
The great thing about comparing an argument to one in the philosophical literature is that it provides access to a whole range of papers on the issue so that ideas don't need to be rediscovered. The corresponding bad thing though is it makes it easy to accidentally commit a straw man attack if the argument isn't actually the same as the one in the literature. So I'll outline my argument (basically I'll extend on the quote of mine you used).
If we thin...
For any given concept of "rational (action)" that's not defined as "(the action) arranging for the best expected winning", you can of course find a situation where that concept and winning are at odds. But if you define them to be the same, it's no longer possible. At that point, you can be taxed for being a given program and not other program (of for the fact that pi is less than 10, for that matter), something you don't control, but such criterion won't be about rationality of your decision-making, because it doesn't provide a suggest...
I'm curious about the downvotes. Do others disagree with me that Parfit's threat ignorer case (and the distinction it illustrates between evaluating dispositions and actions) is worth considering?
In most of these cases we can distinguish further: what is rational is to act in a certain way and to have a certain reputation. This has the benefit of being more airtight - one can argue for a logical relationship between disposition and action. (In Newcomb, the existence of an omniscient agent makes them all equivalent, but weird assumptions lead to weird conclusions.)
Your discussion of the threat game is utterly dissolved by game theory. The game between Tom and Joe has a mixed Nash equilibrium where both make some sort of "probabilistic precommitments", and neither can improve their outcome by changing their "disposition" while assuming the other's "disposition" as given.
I've been tinkering with the idea of making a top level post on this issue, but figured it would get excessively downvoted. So I'll risk it here.
For any decision theory, isn't there some hypothetical where Omega can say, "I've analyzed your decision theory, and I'm giving you proposition X, such that if you act the way your decision theory believes is optimal, you will lose?" The "Omega scans your brain and tortures you if you're too rational" would be an obvious example of this.
Designing a decision theory around any such problem seems ...
A common background assumption on LW seems to be that it's rational to act in accordance with the dispositions one would wish to have. (Rationalists must WIN, and all that.)
E.g., Eliezer:
And more recently, from AdamBell:
Within academic philosophy, this is the position advocated by David Gauthier. Derek Parfit has constructed some compelling counterarguments against Gauthier, so I thought I'd share them here to see what the rest of you think.
First, let's note that there definitely are possible cases where it would be "beneficial to be irrational". For example, suppose an evil demon ('Omega') will scan your brain, assess your rational capacities, and torture you iff you surpass some minimal baseline of rationality. In that case, it would very much be in your interests to fall below the baseline! Or suppose you're rewarded every time you honestly believe the conclusion of some fallacious reasoning. We can easily multiply cases here. What's important for now is just to acknowledge this phenomenon of 'beneficial irrationality' as a genuine possibility.
This possibility poses a problem for the Eliezer-Gauthier methodology. (Quoting Eliezer again:)
The problem, obviously, is that it's possible for irrational agents to receive externally-generated rewards for their dispositions, without this necessarily making their downstream actions any more 'reasonable'. (At this point, you should notice the conflation of 'disposition' and 'choice' in the first quote from Eliezer. Rachel does not envy Irene her choice at all. What she wishes is to have the one-boxer's dispositions, so that the predictor puts a million in the first box, and then to confound all expectations by unpredictably choosing both boxes and reaping the most riches possible.)
To illustrate, consider (a variation on) Parfit's story of the threat-fulfiller and threat-ignorer. Tom has a transparent disposition to fulfill his threats, no matter the cost to himself. So he straps on a bomb, walks up to his neighbour Joe, and threatens to blow them both up unless Joe shines his shoes. Seeing that Tom means business, Joe sensibly gets to work. Not wanting to repeat the experience, Joe later goes and pops a pill to acquire a transparent disposition to ignore threats, no matter the cost to himself. The next day, Tom sees that Joe is now a threat-ignorer, and so leaves him alone.
So far, so good. It seems this threat-ignoring disposition was a great one for Joe to acquire. Until one day... Tom slips up. Due to an unexpected mental glitch, he threatens Joe again. Joe follows his disposition and ignores the threat. BOOM.
Here Joe's final decision seems as disastrously foolish as Tom's slip up. It was good to have the disposition to ignore threats, but that doesn't necessarily make it good idea to act on it. We need to distinguish the desirability of a disposition to X from the rationality of choosing to do X.