Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Nick_Tarleton comments on Newcomb's Problem and Regret of Rationality - Less Wrong

68 Post author: Eliezer_Yudkowsky 31 January 2008 07:36PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (592)

Sort By: Old

You are viewing a single comment's thread.

Comment author: Nick_Tarleton 31 January 2008 08:28:53PM 13 points [-]

Either box B is already full or already empty.

I'm not going to go into the whole literature, but the dominant consensus in modern decision theory is that one should two-box, and Omega is just rewarding agents with irrational dispositions. This dominant view goes by the name of "causal decision theory".

I suppose causal decision theory assumes causality only works in one temporal direction. Confronted with a predictor that was right 100 out of 100 times, I would think it very likely that backward-in-time causation exists, and take only B. I assume this would, as you say, produce absurd results elsewhere.

Comment author: diegocaleiro 22 March 2010 07:28:42PM 28 points [-]

Decisions aren't physical.

The above statement is at least hard to defend. Your decisions are physical and occur inside of you... So these two-boxers are using the wrong model amongst these two (see the drawings....) http://lesswrong.com/lw/r0/thou_art_physics/

If you are a part of physics, so is your decision, so it must account for the correlation between your thought processes and the superintelligence. Once it accounts for that, you decide to one box, because you understood the entanglement of the computation done by omega and the physical process going inside your skull.

If the entanglement is there, you are not looking at it from the outside, you are inside the process.

Our minds have this quirk that makes us think there are two moments, you decide, and then you cheat, you get to decide again. But if you are only allowed to decide once, which is the case, you are rational by one-boxing.

Comment author: dlthomas 28 October 2011 09:43:00PM 1 point [-]

I think you capture the essence of the solution, here.

Comment author: SeventhNadir 12 August 2010 09:30:16PM 0 points [-]

From what I understand, to be a "Rational Agent" in game theory means someone who maximises their utility function (and not the one you ascribe to them). To say Omega is rewarding irrational agents isn't necessarily fair, since payoffs aren't always about the money. Lottery tickets are a good example this.

What if my utility function says the worst outcome is living the rest of my life with regrets that I didn't one box? Then I can one box and still be a completely rational agent.

Comment author: JoshuaZ 12 August 2010 09:36:35PM 9 points [-]

You're complicating the problem too much by bringing in issues like regret. Assume for sake of argument that Newcomb's problem is to maximize the amount of money you receive. Don't think about extraneous utility issues.

Comment author: SeventhNadir 12 August 2010 09:56:12PM 2 points [-]

Fair point. There are too many hidden variables already without me explicitly adding more. If Newcomb's problem is to maximise money recieved (with no regard for what it seen as reasonable), the "Why ain't you rich argument seems like a fairly compelling one doesn't it? Winning the money is all that matters.

I just realised that all I've really done is paraphrase the original post. Curse you source monitoring error!

Comment author: Nornagest 19 November 2010 01:32:15AM *  3 points [-]

Lottery tickets exploit a completely different failure of rationality, that being our difficulties with small probabilities and big numbers, and our problems dealing with scale more generally. (ETA: The fantasies commonly cited in the context of lotteries' "true value" are a symptom of this failure.) It's not hard to come up with a game-theoretic agent that maximizes its payoffs against that kind of math. Second-guessing other agents' models is considerably harder.

I haven't given much thought to this particular problem for a while, but my impression is that Newcomb exposes an exploit in simpler decision theories that's related to that kind of recursive modeling: naively, if you trust Omega's judgment of your psychology, you pick the one-box option, and if you don't, you pick up both boxes. Omega's track record gives us an excellent reason to trust its judgment from a probabilistic perspective, but it's trickier to come up with an algorithm that stabilizes on that solution without immediately trying to outdo itself.

Comment author: PeterisP 24 October 2010 12:27:34PM 6 points [-]

Well, I fail to see any need for backward-in-time causation to get the prediction right 100 out of 100 times.

As far as I understand, similar experiments have been performed in practice and homo sapiens are quite split in two groups 'one-boxers' and 'two-boxers' who generally have strong preferences towards one or other due to whatever differences in their education, logic experience, genetics, reasoning style or whatever factors that are somewhat stable specific to that individual.

Having perfect predictive power (or even the possibility of it existing) is implied and suggested, but it's not really given, it's not really necessary, and IMHO it's not possible and not useful to use this 'perfect predictive power' in any reasoning here.

From the given data in the situation (100 out of 100 that you saw), you know that Omega is a super-intelligent sorter who somehow manages to achieve 99.5% or better accuracy in sorting people into one-boxers and two-boxers.

This accuracy seems also higher than the accuracy of most (all?) people in self-evaluation, i.e., as in many other decision scenarios, there is a significant difference in what people believe they would decide in situation X, and what they actually decide if it happens. [citation might be needed, but I don't have one at the moment, I do recall reading papers about such experiments]. The 'everybody is a perfect logician/rationalist and behaves as such' assumption often doesn't hold up in real life even for self-described perfect rationalists who make strong conscious effort to do so.

In effect, data suggests that probably Omega knows your traits and decision chances (taking into account you taking into account all this) better than you do - it's simply smarter than homo sapiens. Assuming that this is really so, it's better for you to choose option B. Assuming that this is not so, and you believe that you can out-analyze Omega's perception of yourself, then you should choose the opposite of whatever Omega would think of you (gaining 1.000.000 instead of 1.000 or 1.001.000 instead of 1.000.000). If you don't know what Omega knows about you - then you don't get this bonus.

Comment author: [deleted] 03 November 2012 04:18:08PM -1 points [-]

So what you're saying is that the only reason this problem is a problem is because the problem hasn't been defined narrowly enough. You don't know what Omega is capable of, so you don't know which choice to make. So there is no way to logically solve the problem (with the goal of maximizing utility) without additional information.

Here's what I'd do: I'd pick up B, open it, and take A iff I found it empty. That way, Omega's decision of what to put in the box would have to incorporate the variable of what Omega put in the box, causing an infinite regress which will use all cpu cycles until the process is terminated. Although that'll probably result in the AI picking an easier victim to torment and not even giving me a measly thousand dollars.

Comment author: Endovior 03 November 2012 04:56:43PM 1 point [-]

Okay... so since you already know, in advance of getting the boxes, that that's what you'd know, Omega can deduce that. So you open Box B, find it empty, and then take Box A. Enjoy your $1000. Omega doesn't need to infinite loop that one; he knows that you're the kind of person who'd try for Box A too.

Comment author: MixedNuts 03 November 2012 05:27:35PM 0 points [-]

No, putting $1 million in box B works to. Origin64 opens box B, takes the money, and doesn't take box A. It's like "This sentence is true." - whatever Omega does makes the prediction valid.

Comment author: blashimov 03 November 2012 05:39:40PM 0 points [-]

Which means you might end up with either amount of money, since you don't really know enough about Omega , instead of just the one box winnings. So you should still just one box?

Comment author: Endovior 04 November 2012 04:34:14AM 0 points [-]

Not how Omega looks at it. By definition, Omega looks ahead, sees a branch in which you would go for Box A, and puts nothing in Box B. There's no cheating Omega... just like you can't think "I'm going to one-box, but then open Box A after I've pocketed the million" there's no "I'm going to open Box B first, and decide whether or not to open Box A afterward". Unless Omega is quite sure that you have precommitted to never opening Box A ever, Box B contains nothing; the strategy of leaving Box A as a possibility if Box B doesn't pan out is a two-box strategy, and Omega doesn't allow it.

Comment author: TheOtherDave 04 November 2012 04:49:35AM 1 point [-]

Unless Omega is quite sure that you have precommitted to never opening Box A ever

Well, this isn't quite true. What Omega cares about is whether you will open Box A. From Omega's perspective it makes no difference whether you've precommitted to never opening it, or whether you've made no such precommitment but it turns out you won't open it for other reasons.

Comment author: inblankets 20 December 2012 07:49:45AM 0 points [-]

Assuming that Omega's "prediction" is in good faith, and that we can't "break" him as a predictor as a side effect of exploiting casuality loops etc. in order to win.

Comment author: TheOtherDave 20 December 2012 02:30:39PM 0 points [-]

I'm not sure I understood that, but if I did, then yes, assuming that Omega is as described in the thought experiment. Of course, if Omega has other properties (for example, is an unreliable predictor) other things follow.

Comment author: Brilliand 03 September 2015 09:36:48PM *  0 points [-]

If you look in box B before deciding whether to choose box A, then you can force Omega to be wrong. That sounds like so much fun that I might choose it over the $1000.

Comment author: private_messaging 03 June 2012 10:14:37AM *  -1 points [-]

I suppose causal decision theory assumes causality only works in one temporal direction.

That's the popular understanding (or lack thereof) here and among philosophers in general. Philosophers just don't get math. If the decision theory is called causal but doesn't itself make any references to physics, then that's a slightly misleading name. I've written on that before

The math doesn't go "hey hey, the theory is named causal therefore you can't treat 2 robot arms controlled by 2 control computers that run one function on one state, the same as 2 robot arms controlled by 1 computer". Confused sloppy philosophers do.

Also, the best case is to be predicted to 1-box but 2-box in reality. If the prediction works by backwards causality, well then causal decision theory one-boxes. If the prediction works by simulation, the causal decision theory can either have world model where both the value inside predictor and the value inside actual robot are represented by same action A, and 1-box, or it can have uncertainty as of whenever the world outside of it is normal reality or predictor's simulator, where it will again one box (assuming it cares about the real money even if it is inside predictor, which it would if it needs money to pay for e.g. it's child's education). It will also 1-box in simulator and 2-box in reality if it can tell those apart.

Comment author: shokwave 03 June 2012 10:37:50AM 1 point [-]

I'm confused. Causal decision theory was invented or formalised almost entirely by philosophers. It takes the 'causal' in its name from its reliance on inductive logic and inference. It doesn't make sense to claim that philosophers are being sloppy about the word 'causal' here, and claiming that causal decision theory will accept backwards causality and one-box is patently false unless you mean something other than what the symbol 'causal decision theory' refers to when you say 'causal decision theory'.

Comment author: private_messaging 03 June 2012 12:09:22PM *  -1 points [-]

Firstly, the notion that the actions should be chosen based on their consequences, taking the actions as cause of the consequences, was definitely not invented by philosophers. Secondarily, the logical causality is not identical to physical causality (the latter is dependent on specific laws of physics). Thirdly, not all philosophers are sloppy; some are very sloppy some are less sloppy. Fourth, anything that was not put in mathematical form to be manipulated using formal methods, is not formalized. When you formalize stuff you end up stripping notion of self unless explicitly included as part of formalism, stripping notion of the time where the math is working unless explicitly included as part of formalism, and so on, ending up without the problem.

Maybe you are correct; it is better to let symbol 'causal decision theory' to refer to confused philosophy. Then we would need some extra symbol for how the agents implementable using mathematics actually decide (and how robots that predict outcomes of their actions on a world model actually work), which is very very similar to 'causal decision theory' sans all the human preconditions of what self is.

Comment author: shokwave 03 June 2012 12:21:37PM 0 points [-]

I notice I actually agree with you - if we did try, using mathematics, to implement agents who decide and predict in the manner you describe, we'd find it incorrect to describe these agents as causal decision theory agents. In fact, I also expect we'd find ourselves disillusioned with CDT in general, and if philosophers brought it up, we'd direct them to instead engage with the much more interesting agents we've mathematically formalised.

Comment author: private_messaging 03 June 2012 12:39:56PM *  0 points [-]

Well, each philosopher's understanding of CDT seem to differ from the other:


The notion that the actions should be chosen based on consequences - as expressed in the formula here - is perfectly fine, albeit incredibly trivial. Can formalize that all the way into agent. Written such agents myself. Still need a symbol to describe this type of agent.

But philosophers go from this to "my actions should be chosen based on consequences", and it is all about the true meaning of self and falls within the purview of your conundrums of philosophy .

Having 1 computer control 2 robots arms wired in parallel, and having 2 computers running exact same software as before, controlling 2 robot arms, there's no difference for software engineering, its a minor detail that has been entirely abstracted from software. There is difference for philosophizing thought because you can't collapse logical consequences and physical causality into one thing in the latter case.

edit: anyhow. to summarize my point: In terms of agents actually formalized in software, one-boxing is only a matter of implementing predictor into world model somehow, either as second servo controlled by same control variables, or as uncertain world state outside the senses (in the unseen there's either real world or simulator that affects real world via hand of predictor). No conceptual problems what so ever. edit: Good analogy, 'twin paradox' in special relativity. There's only paradox if nobody done the math right.

Comment author: Polaris 16 July 2014 01:23:19PM 1 point [-]


Agreed, the problem immediately reminded me of "retroactive preparation" and time-loop logic. It is not really the same reasonning, but it has the same "turn causality on its head" aspect.

If I don't have proof of the reliability of Omega's predictions, I find myself less likely to be "unreasonnable" when the stakes are higher (that is, I'm more likely to two-box if it's about saving the world).

I find it highly unlikely that an entity wandering across worlds can predict my actions to this level of detail, as it seems way harder than traveling through space or teleporting money. I might risk a net loss of $1 000 to figure it out (much like I'd be willing to spend $1000 to interact with such a space-traveling stuff-teleporting entity), but not a loss of a thousand lives. In the game as the article describe it, I would only one-box if "the loss of what box A contains and nothing in B" was an acceptable outcome.

I would be increasingly likely to one-box as the probability of the AI being actually able to predict my actions in advance increases.

Comment author: private_messaging 16 July 2014 03:07:50PM *  0 points [-]

The thing is, this 'modern decision theory', rather than being some sort of central pillar as you'd assume from the name, is mostly philosophers "struggling in the periphery to try to tell us something", as Feynman once said about philosophers of science.

When it comes to any actual software which does something, this everyday notion of 'causality' proves to be a very slippery concept. This Rude Goldberg machine - like model of the world, where you push a domino and it pushes another domino, and the chain goes to your reward, that's just very approximate physics that people tend to use to make decisions, it's not fundamental, and interesting models of decision making are generally set up to learn that from observed data (which of course makes it impossible to do lazy philosophy involving various verbal hypotheticals where the observations that would lead the agent to believe the problem set up are not specified).