Why do theists, undergrads, and Less Wrongers favor one-boxing on Newcomb?

CarlShulman

Follow-up to: Normative uncertainty in Newcomb's problem

Philosophers and atheists break for two-boxing; theists and Less Wrong break for one-boxing
Personally, I would one-box on Newcomb's Problem. Conditional on one-boxing for lawful reasons, one boxing earns $1,000,000, while two-boxing, conditional on two-boxing for lawful reasons, would deliver only a thousand. But this seems to be firmly a minority view in philosophy, and numerous heuristics about expert opinion suggest that I should re-examine the view.

In the PhilPapers survey, Philosophy undergraduates start off divided roughly evenly between one-boxing and two-boxing:

Newcomb's problem: one box or two boxes?

Other	142 / 217 (65.4%)
Accept or lean toward: one box	40 / 217 (18.4%)
Accept or lean toward: two boxes	35 / 217 (16.1%)

But philosophy faculty, who have learned more (less likely to have no opinion), and been subject to further selection, break in favor of two-boxing:

Newcomb's problem: one box or two boxes?

Other	441 / 931 (47.4%)
Accept or lean toward: two boxes	292 / 931 (31.4%)
Accept or lean toward: one box	198 / 931 (21.3%)

Specialists in decision theory (who are also more atheistic, more compatibilist about free will, and more physicalist than faculty in general) are even more convinced:

Newcomb's problem: one box or two boxes?

Accept or lean toward: two boxes	19 / 31 (61.3%)
Accept or lean toward: one box	8 / 31 (25.8%)
Other	4 / 31 (12.9%)

Looking at the correlates of answers about Newcomb's problem, two-boxers are more likely to believe in physicalism about consciousness, atheism about religion, and other positions generally popular around here (which are also usually, but not always, in the direction of philosophical opinion). Zooming in one correlate, most theists with an opinion are one-boxers, while atheists break for two-boxing:

Newcomb's problem:two boxes

0.125

one box

two boxes

atheism

28.6%

(145/506)

48.8%

(247/506)

theism

40.8%

(40/98)

31.6%

(31/98)

Response pairs: 655 p-value: 0.001

Less Wrong breaks overwhelmingly for one-boxing in survey answers for 2012:

NEWCOMB'S PROBLEM
One-box: 726, 61.4%
Two-box: 78, 6.6%
Not sure: 53, 4.5%
Don't understand: 86, 7.3%
No answer: 240, 20.3%

When I elicited LW confidence levels in a poll, a majority indicated 99%+ confidence in one-boxing, and 77% of respondents indicated 80%+ confidence.

What's going on?

I would like to understand what is driving this difference of opinion. My poll was a (weak) test of the hypothesis that Less Wrongers were more likely to account for uncertainty about decision theory: since on the standard Newcomb's problem one-boxers get $1,000,000, while two-boxers get $1,000, even a modest credence in the correct theory recommending one-boxing could justify the action of one-boxing.

If new graduate students read the computer science literature on program equilibrium, including some local contributions like Robust Cooperation in the Prisoner's Dilemma and A Comparison of Decision Algorithms on Newcomblike Problems, I would guess they would tend to shift more towards one-boxing. Thinking about what sort of decision algorithms it is rational to program, or what decision algorithms would prosper over numerous one-shot Prisoner's Dilemmas with visible source code, could also shift intuitions. A number of philosophers I have spoken with have indicated that frameworks like the use of causal models with nodes for logical uncertainty are meaningful contributions to thinking about decision theory. However, I doubt that for those with opinions, the balance would swing from almost 3:1 for two-boxing to 9:1 for one-boxing, even concentrating on new decision theory graduate students.

On the other hand, there may be an effect of unbalanced presentation to non-experts. Less Wrong is on average less philosophically sophisticated than professional philosophers. Since philosophical training is associated with a shift towards two-boxing, some of the difference in opinion could reflect a difference in training. Then, postings on decision theory have almost all either argued for or assumed one-boxing as the correct response on Newcomb's problem. It might be that if academic decision theorists were making arguments for two-boxing here, or if there was a reduction in pro one-boxing social pressure, there would be a shift in Less Wrong opinion towards two-boxing.

Less Wrongers, what's going on here? What are the relative causal roles of these and other factors in this divergence?

ETA: The SEP article on Causal Decision Theory.

Newcomb's problem: one box or two boxes?

Other	142 / 217 (65.4%)
Accept or lean toward: one box	40 / 217 (18.4%)
Accept or lean toward: two boxes	35 / 217 (16.1%)

But philosophy faculty, who have learned more (less likely to have no opinion), and been subject to further selection, break in favor of two-boxing:

Newcomb's problem: one box or two boxes?

Other	441 / 931 (47.4%)
Accept or lean toward: two boxes	292 / 931 (31.4%)
Accept or lean toward: one box	198 / 931 (21.3%)

Specialists in decision theory (who are also more atheistic, more compatibilist about free will, and more physicalist than faculty in general) are even more convinced:

Newcomb's problem: one box or two boxes?

Accept or lean toward: two boxes	19 / 31 (61.3%)
Accept or lean toward: one box	8 / 31 (25.8%)
Other	4 / 31 (12.9%)

Newcomb's problem:two boxes

0.125

one box

two boxes

atheism

28.6%

(145/506)

48.8%

(247/506)

theism

40.8%

(40/98)

31.6%

(31/98)

Response pairs: 655 p-value: 0.001

TDT's reply to this is a bit more specific.

Informally: Since Omega represents a setup which rewards agents who make a certain decision X, and reality doesn't care why or by what exact algorithm you arrive at X so long as you arrive at X, the problem is fair. Unfair would be "We'll examine your source code and punish you iff you're a CDT agent, but we won't punish another agent who two-boxes as the output of a different algorithm even though your two algorithms had the same output." The problem should not care whether you arrive at your decisions by maximizing expected utility or by picking the first option in English alphabetical order, so long as you arrive at the same decision either way.

More formally: TDT corresponds to maximizing on the class of problems whose payoff is determined by 'the sort of decision you make in the world that you actually encounter, having the algorithm that you do'. CDT corresponds to maximizing over a fair problem class consisting of scenarios whose payoff is determined only by your physical act, and would be a good strategy in the real world if no other agent ever had an algorithm similar to yours (you must be the only CDT-agent in the universe, so that your algorithm only acts at one physical point) and where no other agent could gain any info about your algorithm except by observing your controllable physical acts (tallness being correlated with intelligence is not allowed). UDT allows for maximizing over classes of scenarios where your payoff can depend on actions you would have taken in universes you could have encountered but didn't, i.e., the Counterfactual Mugging. (Parfit's Hitchhiker is outside TDT's problem class, and in UDT, because the car-driver asks "What will this hitchhiker do if I take them to town? so that a dishonorable hitchhiker who is left in the desert is getting a payoff which depends on what they would have done in a situation they did not actually encounter. Likewise the transparent Newcomb's Box. We can clearly see how to maximize on the problem but it's in UDT's class of 'fair' scenarios, not TDT's class.)

If the scenario handed to the TDT algorithm is that only one copy of your algorithm exists within the scenario, acting at one physical point, and no other agent in the scenario has any knowledge of your algorithm apart from acts you can maximize over, then TDT reduces to CDT and outputs the same action as CDT, which is implied by CDT maximizing over its problem class and TDT's class of 'fair' problems strictly including all CDT-fair problems.

If Omega rewards having particular algorithms independently of their outputs, by examining the source code without running it, the only way to maximize is to have the most rewarded algorithm regardless of its output. But this is uninteresting.

If a setup rewards some algorithms more than others because of their different outputs, this is just life. You might as well claim that a cliff punishes people who rationally choose to jump off it.

This situation is interestingly blurred in modal combat where an algorithm may perhaps do better than another because its properties were more transparent (more provable) to another algorithm examining it. Of this I can only say that if, in real life, we end up with AIs examining each other's source code and trying to prove things about each other, calling this 'unfair' is uninteresting. Reality is always the most important domain to maximize over.

This explanation makes UDT seem strictly more powerful than TDT (if UDT can handle Parfit's Hitchhiker and TDT can't).

If that's the case, then is there a point in still focusing on developing TDT? Is it meant as just a stepping stone to an even better decision theory (possibly UDT itself) down the line? Or do you believe UDT's advantages to be counterbalanced by disadvantages?

8[anonymous]13y

I'd just like to say that this comparison of CDT, TDT, and UDT was a very good explanation of the differences. Thanks for that.