Newcomb's Problem and Regret of Rationality

Eliezer Yudkowsky

Quantified Humanism

169 Newcomb's Problem and Regret of Rationality

by Eliezer Yudkowsky

31st Jan 2008

12 min read

620

169

The following may well be the most controversial dilemma in the history of decision theory:

A superintelligence from another galaxy, whom we shall call Omega, comes to Earth and sets about playing a strange little game. In this game, Omega selects a human being, sets down two boxes in front of them, and flies away.

Box A is transparent and contains a thousand dollars.
Box B is opaque, and contains either a million dollars, or nothing.

You can take both boxes, or take only box B.

And the twist is that Omega has put a million dollars in box B iff Omega has predicted that you will take only box B.

Omega has been correct on each of 100 observed occasions so far - everyone who took both boxes has found box B empty and received only a thousand dollars; everyone who took only box B has found B containing a million dollars. (We assume that box A vanishes in a puff of smoke if you take only box B; no one else can take box A afterward.)

Before you make your choice, Omega has flown off and moved on to its next game. Box B is already empty or already full.

Omega drops two boxes on the ground in front of you and flies off.

Do you take both boxes, or only box B?

And the standard philosophical conversation runs thusly:

One-boxer: "I take only box B, of course. I'd rather have a million than a thousand."

Two-boxer: "Omega has already left. Either box B is already full or already empty. If box B is already empty, then taking both boxes nets me $1000, taking only box B nets me $0. If box B is already full, then taking both boxes nets $1,001,000, taking only box B nets $1,000,000. In either case I do better by taking both boxes, and worse by leaving a thousand dollars on the table - so I will be rational, and take both boxes."

One-boxer: "If you're so rational, why ain'cha rich?"

Two-boxer: "It's not my fault Omega chooses to reward only people with irrational dispositions, but it's already too late for me to do anything about that."

There is a large literature on the topic of Newcomblike problems - especially if you consider the Prisoner's Dilemma as a special case, which it is generally held to be. "Paradoxes of Rationality and Cooperation" is an edited volume that includes Newcomb's original essay. For those who read only online material, this PhD thesis summarizes the major standard positions.

I'm not going to go into the whole literature, but the dominant consensus in modern decision theory is that one should two-box, and Omega is just rewarding agents with irrational dispositions. This dominant view goes by the name of "causal decision theory".

As you know, the primary reason I'm blogging is that I am an incredibly slow writer when I try to work in any other format. So I'm not going to try to present my own analysis here. Way too long a story, even by my standards.

But it is agreed even among causal decision theorists that if you have the power to precommit yourself to take one box, in Newcomb's Problem, then you should do so. If you can precommit yourself before Omega examines you; then you are directly causing box B to be filled.

Now in my field - which, in case you have forgotten, is self-modifying AI - this works out to saying that if you build an AI that two-boxes on Newcomb's Problem, it will self-modify to one-box on Newcomb's Problem, if the AI considers in advance that it might face such a situation. Agents with free access to their own source code have access to a cheap method of precommitment.

What if you expect that you might, in general, face a Newcomblike problem, without knowing the exact form of the problem? Then you would have to modify yourself into a sort of agent whose disposition was such that it would generally receive high rewards on Newcomblike problems.

But what does an agent with a disposition generally-well-suited to Newcomblike problems look like? Can this be formally specified?

Yes, but when I tried to write it up, I realized that I was starting to write a small book. And it wasn't the most important book I had to write, so I shelved it. My slow writing speed really is the bane of my existence. The theory I worked out seems, to me, to have many nice properties besides being well-suited to Newcomblike problems. It would make a nice PhD thesis, if I could get someone to accept it as my PhD thesis. But that's pretty much what it would take to make me unshelve the project. Otherwise I can't justify the time expenditure, not at the speed I currently write books.

I say all this, because there's a common attitude that "Verbal arguments for one-boxing are easy to come by, what's hard is developing a good decision theory that one-boxes" - coherent math which one-boxes on Newcomb's Problem without producing absurd results elsewhere. So I do understand that, and I did set out to develop such a theory, but my writing speed on big papers is so slow that I can't publish it. Believe it or not, it's true.

Nonetheless, I would like to present some of my motivations on Newcomb's Problem - the reasons I felt impelled to seek a new theory - because they illustrate my source-attitudes toward rationality. Even if I can't present the theory that these motivations motivate...

First, foremost, fundamentally, above all else:

Rational agents should WIN.

Don't mistake me, and think that I'm talking about the Hollywood Rationality stereotype that rationalists should be selfish or shortsighted. If your utility function has a term in it for others, then win their happiness. If your utility function has a term in it for a million years hence, then win the eon.

But at any rate, WIN. Don't lose reasonably, WIN.

Now there are defenders of causal decision theory who argue that the two-boxers are doing their best to win, and cannot help it if they have been cursed by a Predictor who favors irrationalists. I will talk about this defense in a moment. But first, I want to draw a distinction between causal decision theorists who believe that two-boxers are genuinely doing their best to win; versus someone who thinks that two-boxing is the reasonable or the rational thing to do, but that the reasonable move just happens to predictably lose, in this case. There are a lot of people out there who think that rationality predictably loses on various problems - that, too, is part of the Hollywood Rationality stereotype, that Kirk is predictably superior to Spock.

Next, let's turn to the charge that Omega favors irrationalists. I can conceive of a superbeing who rewards only people born with a particular gene, regardless of their choices. I can conceive of a superbeing who rewards people whose brains inscribe the particular algorithm of "Describe your options in English and choose the last option when ordered alphabetically," but who does not reward anyone who chooses the same option for a different reason. But Omega rewards people who choose to take only box B, regardless of which algorithm they use to arrive at this decision, and this is why I don't buy the charge that Omega is rewarding the irrational. Omega doesn't care whether or not you follow some particular ritual of cognition; Omega only cares about your predicted decision.

We can choose whatever reasoning algorithm we like, and will be rewarded or punished only according to that algorithm's choices, with no other dependency - Omega just cares where we go, not how we got there.

It is precisely the notion that Nature does not care about our algorithm, which frees us up to pursue the winning Way - without attachment to any particular ritual of cognition, apart from our belief that it wins. Every rule is up for grabs, except the rule of winning.

As Miyamoto Musashi said - it's really worth repeating:

"You can win with a long weapon, and yet you can also win with a short weapon. In short, the Way of the Ichi school is the spirit of winning, whatever the weapon and whatever its size."

(Another example: It was argued by McGee that we must adopt bounded utility functions or be subject to "Dutch books" over infinite times. But: The utility function is not up for grabs. I love life without limit or upper bound: There is no finite amount of life lived N where I would prefer a 80.0001% probability of living N years to an 0.0001% chance of living a googolplex years and an 80% chance of living forever. This is a sufficient condition to imply that my utility function is unbounded. So I just have to figure out how to optimize for that morality. You can't tell me, first, that above all I must conform to a particular ritual of cognition, and then that, if I conform to that ritual, I must change my morality to avoid being Dutch-booked. Toss out the losing ritual; don't change the definition of winning. That's like deciding to prefer $1000 to $1,000,000 so that Newcomb's Problem doesn't make your preferred ritual of cognition look bad.)

"But," says the causal decision theorist, "to take only one box, you must somehow believe that your choice can affect whether box B is empty or full - and that's unreasonable! Omega has already left! It's physically impossible!"

Unreasonable? I am a rationalist: what do I care about being unreasonable? I don't have to conform to a particular ritual of cognition. I don't have to take only box B because I believe my choice affects the box, even though Omega has already left. I can just... take only box B.

I do have a proposed alternative ritual of cognition which computes this decision, which this margin is too small to contain; but I shouldn't need to show this to you. The point is not to have an elegant theory of winning - the point is to win; elegance is a side effect.

Or to look at it another way: Rather than starting with a concept of what is the reasonable decision, and then asking whether "reasonable" agents leave with a lot of money, start by looking at the agents who leave with a lot of money, develop a theory of which agents tend to leave with the most money, and from this theory, try to figure out what is "reasonable". "Reasonable" may just refer to decisions in conformance with our current ritual of cognition - what else would determine whether something seems "reasonable" or not?

From James Joyce (no relation), Foundations of Causal Decision Theory:

Rachel has a perfectly good answer to the "Why ain't you rich?" question. "I am not rich," she will say, "because I am not the kind of person the psychologist thinks will refuse the money. I'm just not like you, Irene. Given that I know that I am the type who takes the money, and given that the psychologist knows that I am this type, it was reasonable of me to think that the $1,000,000 was not in my account. The $1,000 was the most I was going to get no matter what I did. So the only reasonable thing for me to do was to take it."

Irene may want to press the point here by asking, "But don't you wish you were like me, Rachel? Don't you wish that you were the refusing type?" There is a tendency to think that Rachel, a committed causal decision theorist, must answer this question in the negative, which seems obviously wrong (given that being like Irene would have made her rich). This is not the case. Rachel can and should admit that she does wish she were more like Irene. "It would have been better for me," she might concede, "had I been the refusing type." At this point Irene will exclaim, "You've admitted it! It wasn't so smart to take the money after all." Unfortunately for Irene, her conclusion does not follow from Rachel's premise. Rachel will patiently explain that wishing to be a refuser in a Newcomb problem is not inconsistent with thinking that one should take the $1,000 whatever type one is. When Rachel wishes she was Irene's type she is wishing for Irene's options, not sanctioning her choice.

It is, I would say, a general principle of rationality - indeed, part of how I define rationality - that you never end up envying someone else's mere choices. You might envy someone their genes, if Omega rewards genes, or if the genes give you a generally happier disposition. But Rachel, above, envies Irene her choice, and only her choice, irrespective of what algorithm Irene used to make it. Rachel wishes just that she had a disposition to choose differently.

You shouldn't claim to be more rational than someone and simultaneously envy them their choice - only their choice. Just do the act you envy.

I keep trying to say that rationality is the winning-Way, but causal decision theorists insist that taking both boxes is what really wins, because you can't possibly do better by leaving $1000 on the table... even though the single-boxers leave the experiment with more money. Be careful of this sort of argument, any time you find yourself defining the "winner" as someone other than the agent who is currently smiling from on top of a giant heap of utility.

Yes, there are various thought experiments in which some agents start out with an advantage - but if the task is to, say, decide whether to jump off a cliff, you want to be careful not to define cliff-refraining agents as having an unfair prior advantage over cliff-jumping agents, by virtue of their unfair refusal to jump off cliffs. At this point you have covertly redefined "winning" as conformance to a particular ritual of cognition. Pay attention to the money!

Or here's another way of looking at it: Faced with Newcomb's Problem, would you want to look really hard for a reason to believe that it was perfectly reasonable and rational to take only box B; because, if such a line of argument existed, you would take only box B and find it full of money? Would you spend an extra hour thinking it through, if you were confident that, at the end of the hour, you would be able to convince yourself that box B was the rational choice? This too is a rather odd position to be in. Ordinarily, the work of rationality goes into figuring out which choice is the best - not finding a reason to believe that a particular choice is the best.

Maybe it's too easy to say that you "ought to" two-box on Newcomb's Problem, that this is the "reasonable" thing to do, so long as the money isn't actually in front of you. Maybe you're just numb to philosophical dilemmas, at this point. What if your daughter had a 90% fatal disease, and box A contained a serum with a 20% chance of curing her, and box B might contain a serum with a 95% chance of curing her? What if there was an asteroid rushing toward Earth, and box A contained an asteroid deflector that worked 10% of the time, and box B might contain an asteroid deflector that worked 100% of the time?

Would you, at that point, find yourself tempted to make an unreasonable choice?

If the stake in box B was something you could not leave behind? Something overwhelmingly more important to you than being reasonable? If you absolutely had to win - really win, not just be defined as winning?

Would you wish with all your power that the "reasonable" decision was to take only box B?

Then maybe it's time to update your definition of reasonableness.

Alleged rationalists should not find themselves envying the mere decisions of alleged nonrationalists, because your decision can be whatever you like. When you find yourself in a position like this, you shouldn't chide the other person for failing to conform to your concepts of reasonableness. You should realize you got the Way wrong.

So, too, if you ever find yourself keeping separate track of the "reasonable" belief, versus the belief that seems likely to be actually true. Either you have misunderstood reasonableness, or your second intuition is just wrong.

Now one can't simultaneously define "rationality" as the winning Way, and define "rationality" as Bayesian probability theory and decision theory. But it is the argument that I am putting forth, and the moral of my advice to Trust In Bayes, that the laws governing winning have indeed proven to be math. If it ever turns out that Bayes fails - receives systematically lower rewards on some problem, relative to a superior alternative, in virtue of its mere decisions - then Bayes has to go out the window. "Rationality" is just the label I use for my beliefs about the winning Way - the Way of the agent smiling from on top of the giant heap of utility. Currently, that label refers to Bayescraft.

I realize that this is not a knockdown criticism of causal decision theory - that would take the actual book and/or PhD thesis - but I hope it illustrates some of my underlying attitude toward this notion of "rationality".

You shouldn't find yourself distinguishing the winning choice from the reasonable choice. Nor should you find yourself distinguishing the reasonable belief from the belief that is most likely to be true.

That is why I use the word "rational" to denote my beliefs about accuracy and winning - not to denote verbal reasoning, or strategies which yield certain success, or that which is logically provable, or that which is publicly demonstrable, or that which is reasonable.

As Miyamoto Musashi said:

"The primary thing when you take a sword in your hands is your intention to cut the enemy, whatever the means. Whenever you parry, hit, spring, strike or touch the enemy's cutting sword, you must cut the enemy in the same movement. It is essential to attain this. If you think only of hitting, springing, striking or touching the enemy, you will not be able actually to cut him."

Newcomb's ProblemDecision theoryBayesianismConditional ConsistencyOne-BoxingPre-CommitmentSomething To ProtectTwo-BoxingRationality

Frontpage

169

When (Not) To Use Probabilities

47 comments83 karma

Twelve Virtues of Rationality

22 comments431 karma

New Comment

Rendering 0/620 comments, sorted by

oldest

(show more) Click to highlight new comments since: Today at 10:20 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

169 Newcomb's Problem and Regret of Rationality

by Eliezer Yudkowsky

31st Jan 2008

12 min read

620

169

The following may well be the most controversial dilemma in the history of decision theory:

A superintelligence from another galaxy, whom we shall call Omega, comes to Earth and sets about playing a strange little game. In this game, Omega selects a human being, sets down two boxes in front of them, and flies away.

Box A is transparent and contains a thousand dollars.
Box B is opaque, and contains either a million dollars, or nothing.

You can take both boxes, or take only box B.

And the twist is that Omega has put a million dollars in box B iff Omega has predicted that you will take only box B.

Omega has been correct on each of 100 observed occasions so far - everyone who took both boxes has found box B empty and received only a thousand dollars; everyone who took only box B has found B containing a million dollars. (We assume that box A vanishes in a puff of smoke if you take only box B; no one else can take box A afterward.)

Before you make your choice, Omega has flown off and moved on to its next game. Box B is already empty or already full.

Omega drops two boxes on the ground in front of you and flies off.

Do you take both boxes, or only box B?

And the standard philosophical conversation runs thusly:

One-boxer: "I take only box B, of course. I'd rather have a million than a thousand."

Two-boxer: "Omega has already left. Either box B is already full or already empty. If box B is already empty, then taking both boxes nets me $1000, taking only box B nets me $0. If box B is already full, then taking both boxes nets $1,001,000, taking only box B nets $1,000,000. In either case I do better by taking both boxes, and worse by leaving a thousand dollars on the table - so I will be rational, and take both boxes."

One-boxer: "If you're so rational, why ain'cha rich?"

Two-boxer: "It's not my fault Omega chooses to reward only people with irrational dispositions, but it's already too late for me to do anything about that."

But what does an agent with a disposition generally-well-suited to Newcomblike problems look like? Can this be formally specified?

First, foremost, fundamentally, above all else:

Rational agents should WIN.

But at any rate, WIN. Don't lose reasonably, WIN.

As Miyamoto Musashi said - it's really worth repeating:

"You can win with a long weapon, and yet you can also win with a short weapon. In short, the Way of the Ichi school is the spirit of winning, whatever the weapon and whatever its size."

From James Joyce (no relation), Foundations of Causal Decision Theory:

Rachel has a perfectly good answer to the "Why ain't you rich?" question. "I am not rich," she will say, "because I am not the kind of person the psychologist thinks will refuse the money. I'm just not like you, Irene. Given that I know that I am the type who takes the money, and given that the psychologist knows that I am this type, it was reasonable of me to think that the $1,000,000 was not in my account. The $1,000 was the most I was going to get no matter what I did. So the only reasonable thing for me to do was to take it."

Irene may want to press the point here by asking, "But don't you wish you were like me, Rachel? Don't you wish that you were the refusing type?" There is a tendency to think that Rachel, a committed causal decision theorist, must answer this question in the negative, which seems obviously wrong (given that being like Irene would have made her rich). This is not the case. Rachel can and should admit that she does wish she were more like Irene. "It would have been better for me," she might concede, "had I been the refusing type." At this point Irene will exclaim, "You've admitted it! It wasn't so smart to take the money after all." Unfortunately for Irene, her conclusion does not follow from Rachel's premise. Rachel will patiently explain that wishing to be a refuser in a Newcomb problem is not inconsistent with thinking that one should take the $1,000 whatever type one is. When Rachel wishes she was Irene's type she is wishing for Irene's options, not sanctioning her choice.

You shouldn't claim to be more rational than someone and simultaneously envy them their choice - only their choice. Just do the act you envy.

Would you, at that point, find yourself tempted to make an unreasonable choice?

Would you wish with all your power that the "reasonable" decision was to take only box B?

Then maybe it's time to update your definition of reasonableness.

As Miyamoto Musashi said:

"The primary thing when you take a sword in your hands is your intention to cut the enemy, whatever the means. Whenever you parry, hit, spring, strike or touch the enemy's cutting sword, you must cut the enemy in the same movement. It is essential to attain this. If you think only of hitting, springing, striking or touching the enemy, you will not be able actually to cut him."

Newcomb's ProblemDecision theoryBayesianismConditional ConsistencyOne-BoxingPre-CommitmentSomething To ProtectTwo-BoxingRationality

Frontpage

169

When (Not) To Use Probabilities

47 comments83 karma

Twelve Virtues of Rationality

22 comments431 karma

Mentioned in

718Simulators

385Review: Planecrash

323Thoughts on the Singularity Institute (SI)

268The True Prisoner's Dilemma

254Eliezer's Sequences and Mainstream Academia

Load More (5/92)

New Comment

Rendering 0/620 comments, sorted by

oldest

(show more) Click to highlight new comments since: Today at 10:20 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

More from Eliezer Yudkowsky

Curated and popular this week

620Comments

620

Comment Permalink

TobyBartels16y20

Thanks for the replies, everybody!

This is a global response to several replies within my little thread here, so I've put it at nearly the top level. Hopefully that works out OK.

I'm glad that FAWS brought up the probabilistic version. That's because the greater the probability that Omega makes mistakes, the more inclined I am to take two boxes. I once read the claim that 70% of people, when told Newcomb's Paradox in an experiment, claim to choose to take only one box. If this is accurate, then Omega can achieve a 70% level of accuracy by predicting that everybody is a one-boxer. Even if 70% is not accurate, you can still make the paradox work by adjusting the dollar amounts, as long as the bias is great enough that Omega can be confident that it will show up at all in the records of its past predictions. (To be fair, the proportion of two-boxers will probably rise as Omega's accuracy falls, and changing the stakes should also affect people's choices; there may not be a fixed point, although I expect that there is.)

If, in addition to the problem as stated (but with only 70% probability of success), I know that Omega always predicts one-boxing, then (hopefully) everybody agrees that I should take both boxes. There needs to some correlation between Omega's predictions and the actual outcomes, not just a high proportion of past successes.

FAWS also writes:

You yourself claim to know what you would do in the boxing experiment

Actually, I don't really want to make that claim. Although I've written things like ‘I would take both boxes’, I really should have written ‘I should take both boxes’. I'm stating a correct decision, not making a prediction about my actual actions. Right now, I predict about a 70% chance of two-boxing given the situation as stated in the original post, although I've never tried to calculate my estimates of probabilities, so who knows what that really means. (H'm, 70% again? Nope, I don't trust that calibration at all!)

FAWS writes elsewhere:

Making a choice between two options […] just means that attributing the reason for your taking whatever option you take is most usefully attributed to you (and not e.g. gravity, government, the person holding a gun to you head etc.).

I don't see what the gun has to do with it; this is a perfectly good problem in decision theory:

Suppose that you have a button that, if pressed, will trigger a bomb that kills two strangers on the other side of the world. I hold a gun to your head and threaten to shoot you if you don't press the button. Should you press it?

A person who presses the button in that situation can reasonably say afterwards ‘I had no choice! Toby held a gun to my head!’, but that doesn't invalidate the question. Such a person might even panic and make the question irrelevant, but it's still a good question.

If it is a fact about you that you will leave the choice up to chance then Omega probably doesn't offer you to take part in the first place.

So that's how Omega gets such a good record! (^_^)

Understanding the question really is important. I've been interpreting it something along these lines: you interrupt your normal thought processes to go through a complete evaluation of the situation before you, then see what you do. (This is exactly what you cannot do if you panic in the gun problem above.) So perhaps we can predict with certain accuracy that an utter bigot will take one course of action, but that is not what the bigot should do, nor is it what they will do if they discard their prejudices and decide afresh.

Now that I think about it, I see some problems with this interpretation, and also some refinements that might fix it. (The first thing to do is to make it less dependent on the specific person making the decision.) But I'll skip the refinements. It's enough to notice that Omega might very well predict that a person will not take the time to think things through, so there is poor correlation between what one should do and what Omega will predict, even though the decision is based on what the world would be like if one did take the time.

I still think that (modulo refinements) this is a good interpretation of what most people would mean if they tell a story and then ask ‘What should this person do?’. (I can try to defend that claim if anybody still wants me to after they finish this comment.) In that case, I stand by my decision that one should take both boxes, at least if there is no good evidence of new physics.

However, I now realise that there is another interpretation, which is more practical, however much the ordinary person might not interpret things this way. That is: sit down and think through the whole situation now, long before you are ever faced with it in real life, and decide what to do. One obvious benefit of this is that when I hold a gun to your head, you won't panic, because you will be prepared. More generally, this is what we are all actually doing right now! So as we make these idle philosophical musings, let's be practical, and decide what we'll do if Omega ever offers us this deal.

In this case, I agree that I will be better off (given the extremely unlikely but possible assumption that I am ever in this situation) if I have decided now to take only Box B. As RobinZ points out, I might change my mind later, but that can't be helped (and to a certain extent shouldn't be helped, since it's best if I take two boxes after Omega predicts that I'll only take one, but we can't judge that extent if Omega is smarter than us, so really there's no benefit to holding back at all).

If Omega is fallible, then the value of one-boxing falls drastically, and even adjusting the amount of money doesn't help in the end; once Omega's proportion of past success matches the observed proportion in experiments (or whatever our best guess of the actual proportion of real people is), then I'm back to two-boxing, since I expect that Omega simply always predicts one-boxing.

In hindsight, it's obvious that the the original post was about decision in this sense, since Eliezer was talking about an AI that modifies its decision procedures in anticipation of facing Omega in the future. Similarly, we humans modify our decision procedures by making commitments and letting ourselves invent rationalisations for them afterwards (although the problem with this is that it makes it hard to change our minds when we receive new information). So obviously Eliezer wants us to decide now (or at least well ahead of time) and use our leet Methods of Rationality to keep the rationalisations in check.

So I hereby decide that I will pick only one box. (You hear that, Omega!?) Since I am honest (and strongly doubt that Omega exists), I'll add that I may very well change my mind if this ever really happens, but that's about what I would do, not what I should do. And in a certain sense, I should change my mind … then. But in another sense, I should (and do!) choose to be a one-boxer now.

(Thanks also to CarlShulman, whom I haven't quoted, but whose comment was a big help in drawing my attention to the different senses of ‘should’, even though I didn't really adopt his analysis of them.)

nhamann16y100

If Omega is fallible, then the value of one-boxing falls drastically, and even adjusting the amount of money doesn't help in the end;

Assume Omega has a probability X of correctly predicting your decision:

If you choose to two-box:

X chance of getting $1000
(1-X) chance of getting $1,001,000

If you choose to take box B only:

X chance of getting $1,000,000
(1-X) chance of getting $0

Your expected utilities for two-boxing and one-boxing are (respectively):

E2 = 1000X + (1-X)1001000
E1 = 1000000X

For E2 > E1, we must have 1000X + 1,001,000 ... (read more)

See in context