"Due to an unexpected mental glitch, he threatens Joe again. Joe follows his disposition and ignores the threat. BOOM. Here Joe's final decision seems as disastrously foolish as Tom's slip up."
But of course, the initial decision to take the pill may be rational, and the "final decision" is constrained so much that we might regard it as a "decision" in name only. The way I see it: When Joe takes the pill, he will stop rational versions of Tom from threatening him, meaning he benefits, but will be at increased risk of irrational versions of Tom threatening him, meaning he loses. Whether the decision to take the pill is rational depends on how many rational versions of Tom he thinks are out there and how many irrational ones there are, as well as the relative costs of being forced to shine shoes and being blown up. If Toms tend to be rational, and shining shoes is unpleasant enough, taking the pill may be rational.
This kind of scenario has made me think in the past: Could this have contributed to some of our emotional tendencies? At times, we experience emotions that over-ride our rational behavior. Anger is a good example, though gratitude might be as well. There may be times when it is not just rational, in terms of reward and cost, to hit back at someone who has wrong us, but we may do anyway because we are angry. However, if we never got angry, and acted rationally all the time, we may be easy targets for people who know that they can wrong us and then retreat to some safe situation where revenge would be irrational. Something that can reduce our rationality, so that we act even when it is not in our interests, might, almost paradoxically, be a good thing for us, because it would make it less rational to attack us like this in the first place. Maybe anger is partly there for that reason - literally to ensure that we will actually do things that get ourselves killed to hit back at someone, as a deterrent.
Of course, someone could ask how people are supposed to know we have that tendency - but when people saw anger working in themselves and others they would generally get the idea - they would understand the consequences of reduced rationality in some situations. It could be argued that the best strategy is to fake your ability to become angry. Maybe you become angry in trivial situations, where the cost of the anger is minimal, while in the extreme situation where you are likely to get killed you act rationally, but a problem with this is that it is more complicated behavior, so we might assume that it is harder for it get evolved in the first place. There would presumably be some kind of balance between real deterrence and fake deterrence at work here.
I can think of real-world examples of this "pill". I think there is supposed to be one wealthy person who told his family that if he was kidnapped a ransom was not to be paid under any circumstances. Now, clearly, his family are likely to ignore that and pay: Any deterrence has failed and the rational thing is to save his life. That suggests that he may have taken precautions: He may have done his best to make it impossible for his family to pay a ransom.
We aren't transparent. The only reason to fulfill our threats is to make it so later people will know that we will, in which case it's totally rational by any decision theory.
A common background assumption on LW seems to be that it's rational to act in accordance with the dispositions one would wish to have. (Rationalists must WIN, and all that.)
E.g., Eliezer:
And more recently, from AdamBell:
Within academic philosophy, this is the position advocated by David Gauthier. Derek Parfit has constructed some compelling counterarguments against Gauthier, so I thought I'd share them here to see what the rest of you think.
First, let's note that there definitely are possible cases where it would be "beneficial to be irrational". For example, suppose an evil demon ('Omega') will scan your brain, assess your rational capacities, and torture you iff you surpass some minimal baseline of rationality. In that case, it would very much be in your interests to fall below the baseline! Or suppose you're rewarded every time you honestly believe the conclusion of some fallacious reasoning. We can easily multiply cases here. What's important for now is just to acknowledge this phenomenon of 'beneficial irrationality' as a genuine possibility.
This possibility poses a problem for the Eliezer-Gauthier methodology. (Quoting Eliezer again:)
The problem, obviously, is that it's possible for irrational agents to receive externally-generated rewards for their dispositions, without this necessarily making their downstream actions any more 'reasonable'. (At this point, you should notice the conflation of 'disposition' and 'choice' in the first quote from Eliezer. Rachel does not envy Irene her choice at all. What she wishes is to have the one-boxer's dispositions, so that the predictor puts a million in the first box, and then to confound all expectations by unpredictably choosing both boxes and reaping the most riches possible.)
To illustrate, consider (a variation on) Parfit's story of the threat-fulfiller and threat-ignorer. Tom has a transparent disposition to fulfill his threats, no matter the cost to himself. So he straps on a bomb, walks up to his neighbour Joe, and threatens to blow them both up unless Joe shines his shoes. Seeing that Tom means business, Joe sensibly gets to work. Not wanting to repeat the experience, Joe later goes and pops a pill to acquire a transparent disposition to ignore threats, no matter the cost to himself. The next day, Tom sees that Joe is now a threat-ignorer, and so leaves him alone.
So far, so good. It seems this threat-ignoring disposition was a great one for Joe to acquire. Until one day... Tom slips up. Due to an unexpected mental glitch, he threatens Joe again. Joe follows his disposition and ignores the threat. BOOM.
Here Joe's final decision seems as disastrously foolish as Tom's slip up. It was good to have the disposition to ignore threats, but that doesn't necessarily make it good idea to act on it. We need to distinguish the desirability of a disposition to X from the rationality of choosing to do X.