Thanks for posting. Your analysis is an improvement over the LW conventional wisdom, but you still doesn't get it right, where right, to me, means the way it is analyzed by the guys who won all those Nobel prizes in economics. You write:
First, let's note that there definitely are possible cases where it would be "beneficial to be irrational".
But in every example you supply, what you really want is not exactly to be irrational; rather it is to be believed irrational by the other player in the game. But you don't notice this because in each of your artificial examples, the other player is effectively omniscient, so the only way to be believed irrational is to actually be irrational. But then, once the other player really believes, his strategies and actions are modified in such a way the your expected behavior (which would have been irrational if the other player had not come to believe you irrational) is now no longer irrational!
But, better yet, lets Taboo the word irrational. What you really want him to believe is that you will play some particular strategy. If he does, in fact, believe, then he will choose a particular strategy, and your own best response is t...
But in every example you supply, what you really want is not exactly to be irrational; rather it is to be believed irrational by the other player in the game.
I don't think that's the real problem: after all, Parfit's Hitchhiker and Newcomb's problem also eliminate this distinction by positing an Omega that will not be wrong in its predictions.
The real problem is that Chappell has delineated a failure mode that we don't care about. TDT/UDT are optimized for situations in which the world only cares about what you would do, not why you decide to do so. In Chappell's example's, there's no corresponding action that forms the basis of the failure; the "ritual of cognition" alone determines your punishment.
The EY article he linked to ("Newcomb's Problem and the Regret of Rationality") makes the irrelevance of these cases very clear:
...Next, let's turn to the charge that Omega favors irrationalists. I can conceive of a superbeing who rewards only people born with a particular gene, regardless of their choices. I can conceive of a superbeing who rewards people whose brains inscribe the particular algorithm of "Describe your options in English and choose the l
Here's another way of looking at the situation that may or may not be helpful. Suppose I ask you, right here and now, what you'd do in the hypothetical future Parfit's Hitchhiker scenario if your opponent was a regular human with Internet access. You have several options:
Answer truthfully that you'd pay $100, thus proving that you don't subscribe to CDT or EDT. (This is the alternative I would choose.)
Answer that you'd refuse to pay. Now you've created evidence on the Internet, and if/when you face the scenario in real life, the driver will Google your name, check the comments on LW and leave you in the desert to die. (Assume the least convenient possible world where you can't change or delete your answer once it's posted.)
Answer that you'd pay up, but secretly plan to refuse. This means you'd be lying to us here in the comments - surely not a very nice thing to do. But if you subscribe to CDT with respect to utterances as well as actions, this is the alternative you're forced to choose. (Which may or may not make you uneasy about CDT.)
Wei-Dai wrote a post entitled The Absent-Minded Driver which I labeled "snarky". Moreover, I suggested that the snarkiness was so bad as to be nauseating, so as to drive reasonable people to flee in horror from LW and SAIA. I here attempt to defend these rather startling opinions. Here is what Wei-Dai wrote that offended me:
This post examines an attempt by professional decision theorists to treat an example of time inconsistency, and asks why they failed to reach the solution (i.e., TDT/UDT) that this community has more or less converged upon. (Another aim is to introduce this example, which some of us may not be familiar with.) Before I begin, I should note that I don't think "people are crazy, the world is mad" (as Eliezer puts it) is a good explanation. Maybe people are crazy, but unless we can understand how and why people are crazy (or to put it more diplomatically, "make mistakes"), how can we know that we're not being crazy in the same way or making the same kind of mistakes?
The paper that Wei-Dai reviews is "The Absent-Minded Driver" by Robert J. Aumann, Sergiu Hart, and Motty Perry. Wei-Dai points out, rather condescendingly...
There are a few essential questions here:
My claim is purely theoretical: we need to distinguish, conceptually, between desirable dispositions and rational actions. It seems to me that many on LW fail to make this conceptual distinction, which can lead to mistaken (or at least under-argued) theorizing about rationality
This is because actions only ever arise from dispositions. Yes, given that Omega has predicted you will one-box, it would (as an abstract fact) be to your benefit to two-box; but in order for you to actually two-box, you would have to execute some instruction in your source code, which, if it were present, Omega would have read, and thus would not have predicted that you would one-box.
Hence only dispositions are of interest.
It was good to have the disposition to ignore threats
But not as good as the disposition to ignore threats, except when the threats are caused by transparently accidental mental glitches (which would not be encouraged by the disposition).
Eliezer's theory is more-or-less causal decision theory with a different account of dependency hypotheses/counterfactuals. The most relevant philosophical disputes would be about whether to use "local miracle" counterfactuals rather than various backtracking counterfactuals, or logical/mathematical counterfactuals (Eliezer's timeless decision theory idea).
"Due to an unexpected mental glitch, he threatens Joe again. Joe follows his disposition and ignores the threat. BOOM. Here Joe's final decision seems as disastrously foolish as Tom's slip up."
But of course, the initial decision to take the pill may be rational, and the "final decision" is constrained so much that we might regard it as a "decision" in name only. The way I see it: When Joe takes the pill, he will stop rational versions of Tom from threatening him, meaning he benefits, but will be at increased risk of irration...
Sort of a side note to the main topic of discussion but being as my post was quoted, maybe worth responding:
The great thing about comparing an argument to one in the philosophical literature is that it provides access to a whole range of papers on the issue so that ideas don't need to be rediscovered. The corresponding bad thing though is it makes it easy to accidentally commit a straw man attack if the argument isn't actually the same as the one in the literature. So I'll outline my argument (basically I'll extend on the quote of mine you used).
If we thin...
For any given concept of "rational (action)" that's not defined as "(the action) arranging for the best expected winning", you can of course find a situation where that concept and winning are at odds. But if you define them to be the same, it's no longer possible. At that point, you can be taxed for being a given program and not other program (of for the fact that pi is less than 10, for that matter), something you don't control, but such criterion won't be about rationality of your decision-making, because it doesn't provide a suggest...
I'm curious about the downvotes. Do others disagree with me that Parfit's threat ignorer case (and the distinction it illustrates between evaluating dispositions and actions) is worth considering?
In most of these cases we can distinguish further: what is rational is to act in a certain way and to have a certain reputation. This has the benefit of being more airtight - one can argue for a logical relationship between disposition and action. (In Newcomb, the existence of an omniscient agent makes them all equivalent, but weird assumptions lead to weird conclusions.)
Your discussion of the threat game is utterly dissolved by game theory. The game between Tom and Joe has a mixed Nash equilibrium where both make some sort of "probabilistic precommitments", and neither can improve their outcome by changing their "disposition" while assuming the other's "disposition" as given.
I've been tinkering with the idea of making a top level post on this issue, but figured it would get excessively downvoted. So I'll risk it here.
For any decision theory, isn't there some hypothetical where Omega can say, "I've analyzed your decision theory, and I'm giving you proposition X, such that if you act the way your decision theory believes is optimal, you will lose?" The "Omega scans your brain and tortures you if you're too rational" would be an obvious example of this.
Designing a decision theory around any such problem seems ...
Wei-Dai wrote a post entitled The Absent-Minded Driver which I labeled "snarky". Moreover, I suggested that the snarkiness was so bad as to be nauseating, so as to drive reasonable people to flee in horror from LW and SAIA. I here attempt to defend these rather startling opinions. Here is what Wei-Dai wrote that offended me:
This post examines an attempt by professional decision theorists to treat an example of time inconsistency, and asks why they failed to reach the solution (i.e., TDT/UDT) that this community has more or less converged upon. (Another aim is to introduce this example, which some of us may not be familiar with.) Before I begin, I should note that I don't think "people are crazy, the world is mad" (as Eliezer puts it) is a good explanation. Maybe people are crazy, but unless we can understand how and why people are crazy (or to put it more diplomatically, "make mistakes"), how can we know that we're not being crazy in the same way or making the same kind of mistakes?
The paper that Wei-Dai reviews is "The Absent-Minded Driver" by Robert J. Aumann, Sergiu Hart, and Motty Perry. Wei-Dai points out, rather condescendingly:
(Notice that the authors of this paper worked for a place called Center for the Study of Rationality, and one of them won a Nobel Prize in Economics for his work on game theory. I really don't think we want to call these people "crazy".)
Wei-Dai then proceeds to give a competent description of the problem and the standard "planning-optimality" solution of the problem. Next comes a description of an alternative seductive-but-wrong solution by Piccione and Rubinstein. I should point that everyone - P&R, Aumann, Hart, and Perry, Wei-Dai, me, and hopefully you who look into this - realizes that the alternative P&R solution is wrong. It gets the wrong result. It doesn't win. The only problem is explaining exactly where the analysis leading to that solution went astray, and in explaining how it might be modified so as to go right. Making this analysis was, as I see it, the whole point of both papers - P&R and Aumann et al. Wei-Dai describes some characteristics of Aumann et al's corrected version of the alternate solution. Then he (?) goes horribly astray:
In problems like this one, UDT is essentially equivalent to planning-optimality. So why did the authors propose and argue for action-optimality despite its downsides ..., instead of the alternative solution of simply remembering or recomputing the planning-optimal decision at each intersection and carrying it out?
But, as anyone who reads the paper carefully should see, they weren't arguing for action-optimality as the solution. They never abandoned planning optimality. Their point is that if you insist on reasoning in this way, (and Seldin's notion of "subgame perfection" suggests some reasons why you might!) then the algorithm they call "action-optimality" is the way to go about it.
But Wei-Dai doesn't get this. Instead we get this analysis of how these brilliant people just haven't had the educational advantages that LW folks have:
Well, the authors don't say (they never bothered to argue against it), but I'm going to venture some guesses:
- That solution is too simple and obvious, and you can't publish a paper arguing for it.
- It disregards "the probability of being at X", which intuitively ought to play a role.
- The authors were trying to figure out what is rational for human beings, and that solution seems too alien for us to accept and/or put into practice.
- The authors were not thinking in terms of an AI, which can modify itself to use whatever decision theory it wants to.
- Aumann is known for his work in game theory. The action-optimality solution looks particularly game-theory like, and perhaps appeared more natural than it really is because of his specialized knowledge base.
- The authors were trying to solve one particular case of time inconsistency. They didn't have all known instances of time/dynamic/reflective inconsistencies/paradoxes/puzzles laid out in front of them, to be solved in one fell swoop.
Taken together, these guesses perhaps suffice to explain the behavior of these professional rationalists, without needing to hypothesize that they are "crazy". Indeed, many of us are probably still not fully convinced by UDT for one or more of the above reasons.
Let me just point out that the reason it is true that "they never argued against it" is that they had already argued for it. Check out the implications of their footnote #4!
Ok, those are the facts, as I see them. Was Wei-Dai snarky? I suppose it depends on how you define snarkiness. Taboo "snarky". I think that he was overbearingly condescending without the slightest real reason for thinking himself superior. "Snarky" may not be the best one-word encapsulation of that attitude, but it is the one I chose. I am unapologetic. Wei-Dai somehow came to believe himself better able to see the truth than a Nobel laureate in the Nobel laureate's field. It is a mistake he would not have made had he simply read a textbook or taken a one-semester course in the field. But I'm coming to see it as a mistake made frequently by SIAI insiders.
Let me point out that the problem of forgetful agents may seem artificial, but it is actually extremely important. An agent with perfect recall playing the iterated PD, knowing that it is to be repeated exactly 100 times, should rationally choose to defect. On the other hand, if he cannot remember how many iterations remain to be played, and knows that the other player cannot remember either, should cooperate by playing Tit-for-Tat or something similar.
Well, that is my considered response on "snarkiness". I still have to respond on some other points, and I suspect that, upon consideration, I am going to have to eat some crow. But I'm not backing down on this narrow point. Wei-Dai blew it in interpreting Aumann's paper. (And also, other people who know some game theory should read the paper and savor the implications of footnote #4. It is totally cool).
Preliminary notes: You can call me "Wei Dai" (that's firstname lastname). "He" is ok. I have taken a graduate level course in game theory (where I got a 4.0 grade, in case you suspect that I coasted through it), and have Fudenberg and Tirole's "Game Theory" and Joyce's "Foundations of Causal Decision Theory" as two of the few physical books that I own.
...Their point is that if you insist on reasoning in this way, (and Seldin's notion of "subgame perfection" suggests some reasons why you might!) then the algo
A common background assumption on LW seems to be that it's rational to act in accordance with the dispositions one would wish to have. (Rationalists must WIN, and all that.)
E.g., Eliezer:
And more recently, from AdamBell:
Within academic philosophy, this is the position advocated by David Gauthier. Derek Parfit has constructed some compelling counterarguments against Gauthier, so I thought I'd share them here to see what the rest of you think.
First, let's note that there definitely are possible cases where it would be "beneficial to be irrational". For example, suppose an evil demon ('Omega') will scan your brain, assess your rational capacities, and torture you iff you surpass some minimal baseline of rationality. In that case, it would very much be in your interests to fall below the baseline! Or suppose you're rewarded every time you honestly believe the conclusion of some fallacious reasoning. We can easily multiply cases here. What's important for now is just to acknowledge this phenomenon of 'beneficial irrationality' as a genuine possibility.
This possibility poses a problem for the Eliezer-Gauthier methodology. (Quoting Eliezer again:)
The problem, obviously, is that it's possible for irrational agents to receive externally-generated rewards for their dispositions, without this necessarily making their downstream actions any more 'reasonable'. (At this point, you should notice the conflation of 'disposition' and 'choice' in the first quote from Eliezer. Rachel does not envy Irene her choice at all. What she wishes is to have the one-boxer's dispositions, so that the predictor puts a million in the first box, and then to confound all expectations by unpredictably choosing both boxes and reaping the most riches possible.)
To illustrate, consider (a variation on) Parfit's story of the threat-fulfiller and threat-ignorer. Tom has a transparent disposition to fulfill his threats, no matter the cost to himself. So he straps on a bomb, walks up to his neighbour Joe, and threatens to blow them both up unless Joe shines his shoes. Seeing that Tom means business, Joe sensibly gets to work. Not wanting to repeat the experience, Joe later goes and pops a pill to acquire a transparent disposition to ignore threats, no matter the cost to himself. The next day, Tom sees that Joe is now a threat-ignorer, and so leaves him alone.
So far, so good. It seems this threat-ignoring disposition was a great one for Joe to acquire. Until one day... Tom slips up. Due to an unexpected mental glitch, he threatens Joe again. Joe follows his disposition and ignores the threat. BOOM.
Here Joe's final decision seems as disastrously foolish as Tom's slip up. It was good to have the disposition to ignore threats, but that doesn't necessarily make it good idea to act on it. We need to distinguish the desirability of a disposition to X from the rationality of choosing to do X.