Follow-up to: Normative uncertainty in Newcomb's problem
Philosophers and atheists break for two-boxing; theists and Less Wrong break for one-boxing
Personally, I would one-box on Newcomb's Problem. Conditional on one-boxing for lawful reasons, one boxing earns $1,000,000, while two-boxing, conditional on two-boxing for lawful reasons, would deliver only a thousand. But this seems to be firmly a minority view in philosophy, and numerous heuristics about expert opinion suggest that I should re-examine the view.
In the PhilPapers survey, Philosophy undergraduates start off divided roughly evenly between one-boxing and two-boxing:
Newcomb's problem: one box or two boxes?
Other | 142 / 217 (65.4%) |
Accept or lean toward: one box | 40 / 217 (18.4%) |
Accept or lean toward: two boxes | 35 / 217 (16.1%) |
But philosophy faculty, who have learned more (less likely to have no opinion), and been subject to further selection, break in favor of two-boxing:
Newcomb's problem: one box or two boxes?
Other | 441 / 931 (47.4%) |
Accept or lean toward: two boxes | 292 / 931 (31.4%) |
Accept or lean toward: one box | 198 / 931 (21.3%) |
Specialists in decision theory (who are also more atheistic, more compatibilist about free will, and more physicalist than faculty in general) are even more convinced:
Newcomb's problem: one box or two boxes?
Accept or lean toward: two boxes | 19 / 31 (61.3%) |
Accept or lean toward: one box | 8 / 31 (25.8%) |
Other | 4 / 31 (12.9%) |
Looking at the correlates of answers about Newcomb's problem, two-boxers are more likely to believe in physicalism about consciousness, atheism about religion, and other positions generally popular around here (which are also usually, but not always, in the direction of philosophical opinion). Zooming in one correlate, most theists with an opinion are one-boxers, while atheists break for two-boxing:
Newcomb's problem:two boxes | 0.125 | |||||||||||||||||
Response pairs: 655 p-value: 0.001
|
Less Wrong breaks overwhelmingly for one-boxing in survey answers for 2012:
NEWCOMB'S PROBLEM
One-box: 726, 61.4%
Two-box: 78, 6.6%
Not sure: 53, 4.5%
Don't understand: 86, 7.3%
No answer: 240, 20.3%
When I elicited LW confidence levels in a poll, a majority indicated 99%+ confidence in one-boxing, and 77% of respondents indicated 80%+ confidence.
What's going on?
I would like to understand what is driving this difference of opinion. My poll was a (weak) test of the hypothesis that Less Wrongers were more likely to account for uncertainty about decision theory: since on the standard Newcomb's problem one-boxers get $1,000,000, while two-boxers get $1,000, even a modest credence in the correct theory recommending one-boxing could justify the action of one-boxing.
If new graduate students read the computer science literature on program equilibrium, including some local contributions like Robust Cooperation in the Prisoner's Dilemma and A Comparison of Decision Algorithms on Newcomblike Problems, I would guess they would tend to shift more towards one-boxing. Thinking about what sort of decision algorithms it is rational to program, or what decision algorithms would prosper over numerous one-shot Prisoner's Dilemmas with visible source code, could also shift intuitions. A number of philosophers I have spoken with have indicated that frameworks like the use of causal models with nodes for logical uncertainty are meaningful contributions to thinking about decision theory. However, I doubt that for those with opinions, the balance would swing from almost 3:1 for two-boxing to 9:1 for one-boxing, even concentrating on new decision theory graduate students.
On the other hand, there may be an effect of unbalanced presentation to non-experts. Less Wrong is on average less philosophically sophisticated than professional philosophers. Since philosophical training is associated with a shift towards two-boxing, some of the difference in opinion could reflect a difference in training. Then, postings on decision theory have almost all either argued for or assumed one-boxing as the correct response on Newcomb's problem. It might be that if academic decision theorists were making arguments for two-boxing here, or if there was a reduction in pro one-boxing social pressure, there would be a shift in Less Wrong opinion towards two-boxing.
Less Wrongers, what's going on here? What are the relative causal roles of these and other factors in this divergence?
ETA: The SEP article on Causal Decision Theory.
No, my point is that TDT, as a theory, maximizes over a space of decisions, not a space of algorithms, and in holding TDT to be rational, I am not merely holding it to occupy the most rational point in the space of algorithms, but saying that on its target problem class, TDT's output is indeed always the most rational decision within the space of decisions. I simply don't believe that it's particularly rational to maximize over only the physical consequences of an act in a problem where the payoff is determined significantly by logical consequences of your algorithm's output, such as Omega's prediction of your output, or cohorts who will decide similarly to you. Your algorithm can choose to have any sort of decision-type it likes, so it should choose the decision-type with the best payoff. There is just nothing rational about blindly shutting your eyes to logical consequences and caring only about physical consequences, any more than there's something rational about caring only about causes that work through blue things instead of red things. None of this discussion is taking place inside a space of algorithms rather than a space of decisions or object-level outputs.
The fact that you would obviously choose a non-CDT algorithm at meta level, on a fair problem in which payoffs have no lexical dependence on algorithms' exact code apart from their outputs, is, on my view, very indictive of the rationality of CDT. But the justification for TDT is not that it is what CDT would choose at the meta-level. At the meta-level, if CDT self-modifies at 7pm, it will modify to a new algorithm which one-boxes whenever Omega has glimpsed its source code after 7pm but two-box if Omega saw its code at 6:59pm, even though Omega is taking the self-modification into account in its prediction. Since the original CDT is not reflectively consistent on a fair problem, it must be wrong. Insofar as TDT chooses TDT, when offered a chance to self-modify on problems within its problem space, it is possible that TDT is right for that problem space.
But the main idea is just that TDT is directly outputting the rational action because we want the giant heap of money and not to stick to this strange, ritual concept of the 'rational' decision being the one that cares only about causal consequences and not logical consesquences. We need not be forced to increasingly contorted redefinitions of winning in order to say that the two-boxer is winning despite not being rich, or that the problem is unfair to rationalists despite Omega not caring why you choose what you chose and your algorithm being allowed to choose any decision-type it likes. In my mental world, the object-level output of TDT just is the right thing to do, and so there is no need to go meta.
I would also expect this to be much the same reasoning, at a lower level of sophistication, for more casual LW readers; I don't think they'd be going meta to justify TDT from within CDT, especially since the local writeups of TDT also justified themselves at the object and not meta level.
As you say, academic decision theorists do indeed have Parfit on rational irrationality and a massive literature thereupon, which we locally throw completely out the window. It's not a viewpoint very suited to self-modifying AI, where the notion of programmers working from a theory which says their AI will end up 'rationally irrational' ought to give you the screaming willies. Even notions like, "I will nuke us both if you don't give me all your lunch money!", if it works on the blackmailee, can be considered as maximizing over a matrix of strategic responses when the opponent is only maximizing over their action without considering how you're considering their reactions. We can do anything a Parfitian rational irrationalist can do from within a single theory, and we have no prejudice to call that theory's advice irrational, nor reason to resort to precommitment.
Interesting. I have a better grasp of what you're saying now (or maybe not what you're saying, but why someone might think that what you are saying is true). Rapid responses to information that needs digesting are unhelpful so I have nothing further to say for now (though I still think my original post goes some way to explaining the opinions of those on LW that haven't thought in detail about decision theory: a focus on algorithm rather than decisions means that people think one-boxing is rational even if they don't agree with your claims about focusing o... (read more)