"Rational Agents Win"

Isaac King

LESSWRONG
LW

"Rational Agents Win" — LessWrong

8 "Rational Agents Win"

by Isaac King

23rd Sep 2021

2 min read

8

Rachel and Irene are walking home while discussing Newcomb's problem. Irene explains her position:

"Rational agents win. If I take both boxes, I end up with $1000. If I take one box, I end up with $1,000,000. It shouldn't matter why I'm making the decision; there's an obvious right answer here. If you walk away with less money yet claim you made the 'rational' decision, you don't seem to have a very good understanding of what it means to be rational".

Before Rachel can respond, Omega appears from around the corner. It sets two boxes on the ground. One is opaque, and the other is transparent. The transparent one clearly has $1000 inside. Omega says "I've been listening to your conversation and decided to put you to the test. These boxes each have fingerprint scanners that will only open for Irene. In 5 minutes, both boxes will incinerate their contents. The opaque box has $1,000,000 in it iff I predicted that Irene would not open the transparent box. Also, this is my last test of this sort, and I was programmed to self-destruct after I'm done." Omega proceeds to explode into tiny shards of metal.

Being in the sort of world where this kind of thing happens from time to time, Irene and Rachel don't think much of it. Omega has always been right in the past. (Although this is the first time it's self-destructed afterwards.) Irene promptly walks up to the opaque box and opens it, revealing $1,000,000, which she puts into her bag. She begins to walk away, when Rachel says:

"Hold on just a minute now. There's $1000 in that other box, which you can open. Omega can't take the $1,000,000 away from you now that you have it. You're just going to leave that $1000 there to burn?"

"Yup. I pre-committed to one-box on Newcomb's problem, since it results in me getting $1,000,000. The only alternative would have resulted in that box being empty, and me walking away with only $1000. I made the rational decision."

"You're perfectly capable of opening that second box. There's nothing physically preventing you from doing so. If I held a gun to your head and threatened to shoot you if you didn't open it, I think you might do it. If that's not enough, I could threaten to torture all of humanity for ten thousand years. I'm pretty sure at that point, you'd open it. So you aren't 'pre-committed' to anything. You're simply choosing not to open the box, and claiming that walking away $1000 poorer makes you the 'rational' one. Isn't that exactly what you told me that truly rational agents didn't do?"

"Good point", says Irene. She opens the second box, and goes home with $1,001,000. Why shouldn't she? Omega's dead.

Newcomb's ProblemParables & FablesRationality

Frontpage

8

New Comment

Rendering 32/33 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 5:48 PM

[-]aphyer4y440

"Good point", says Irene. She opens the second box, and goes home with $1,001,000. Why shouldn't she? Omega's dead.

SIMULATION COMPLETE.

RESULT: BOTH BOXES TAKEN.

OMEGA-COMMITMENT CONSISTENT ACTION: OPAQUE BOX EMPTY

Irene promptly walks up to the opaque box and opens it, revealing nothing. She stares in shock.

[-]Valentine4y30

Beautifully said.

Although at that point she could respond in spite by refusing to open the transparent box. Sure it leaves $1000 to burn, but maybe that's worth it to her just to spit on Omega's grave by proving it wrong.

…which would force this scenario to also be a simulation.

Meaning that Omega cannot fulfill its directive when predicting spiteful Irene's actions. She'll one-box iff Omega predicts she'll two-box.

Oh dear.

[-]Isaac King4y10

I don't find the simulation argument very compelling. I can conceive of many ways for Omega to arrive at a prediction with high probability of being correct that don't involve a full, particle-by-particle simulation of the actors.

[This comment is no longer endorsed by its author]Reply

[-]Dagon4y50

The underlying question remains the accuracy of the prediction and what sequences of events (if any) can include Omega being incorrect.

In the "strong omega" scenarios, the opaque box is empty in all the universes where Irene opens the transparent box (including after Omega's death). Yoav's description seems right to me - Irene opens the opaque box, and is SHOCKED to find it empty, as she only planned to open the one box. But her prediction of her behavior was incorrect, not Omega's prediction.

In "weak omega" scenarios, who knows what the specifics are? Maybe Omega's wrong in this case.

[-]Measure4y30

In the traditional problem, you have to decide to discard the transparent box before opening the opaque box (single decision step). Here, you're making sequential choices, so there is a policy that makes "strong Omega" inconsistent (namely, discarding B just when you see that A is empty).

[-]Yoav Ravid4y*170

I liked the story, but I think it misses some of how newcomb's problem works (at least as I understand it). Also, welcome to LessWrong! I think this is a pretty good first post.

So, this is not a problem of decision theory, but a problem with Omega's predictive capabilities in this story (maybe she was always right before this, but she wasn't here).

In the case where you find yourself holding the $1,000,000 and the $1000 are still available, sure, you can pick them up. That only happens if either Omega failed to predict what you will do, or if you somehow set things up such that you couldn't, or had to pay a big price, to break your precommitment.

If Omega really had perfect predictive capabilities in this story this is what would happen

Irene promptly walks up to the opaque box and opens it, revealing... Nothing.

"What?!" Irene exclaimed, "But I didn't mean to open the transparent box. I precommited to not doing it!"

"Maybe Omega knew better than you." Rachel said, "Anyway, now that you're only left with the $1000 in the transparent box, are you going to take them?"

"What? no! Then I would break my commitment and prove Omega right!"

"You can either do that, or walk out with a $1000. You're not getting anything from not taking it, I'm sure if I threatened you to take it you would. That's the rational decision."

Irene sighed, "Good point, I guess." And she opened the box and walked out with a $1000.

Also it assumes a non iterated game, and by not iterated I don't just mean she doesn't play against Omega again, but doesn't play against anyone else again, otherwise this becomes part of her reputation and thus is no longer "free".

[-]Isaac King4y*10

In the case where you find yourself holding the $1,000,000 and the $1000 are still available, sure, you can pick them up. That only happens if either Omega failed to predict what you will do, or if you somehow set things up such that you couldn't, or had to pay a big price, to break your precommitment.

I don't think that's true. The traditional Newcomb's problem could use the exact setup that I used here, the only difference would be that either the opaque box is empty, or Irene never opens the transparent box. The idea that the $1000 is always "available" to the player is central to Newcomb's problem.

[-]Yoav Ravid4y20

In my comment "that" in "That only happens if" referred to you taking the $1,000, not to them being available. So to clarify:

If we assume that Omega's predictions are perfect, then you only find $1,000,000 in the box in cases where for some reason you don't also take the $1,000

Maybe you have some beliefs about why you shouldn't do it
Maybe it's against your honor to do it
Maybe you're programmed not to do it
Maybe before you met Omega you gave a friend $2,000 and told him to give them back to you only if you don't take the $1,000, and otherwise burn them.

If you find yourself going out with the contents of both boxes, either you're in a simulation or Omega was wrong.

If Omega is wrong (and it's a one shot, and you know you're not in a simulation) then yeah, you have no reason not to take the $1,000 too. But the less accurate Omega is, the less the problem is newcomblike.

[-]Isaac King4y10

I think you're missing my point. After the $1,000,000 has been taken, Irene doesn't suddenly lose her free will. She's perfectly capable of taking the $1000; she's just decided not to.

You seem to think I'm making some claim like "one-boxing is irrational" or "Newcomb's problem is impossible", which is not at all what I'm doing. I'm trying to demonstrate that the idea of "rational agents just do what maximizes their utility and don't worry about having to have a consistent underlying decision theory" appears to result in a contradiction as soon as Irene's decision has been made.

[-]Yoav Ravid4y20

I understood your point. What I'm saying is that Irene is Indeed capable of also taking the $1000, but if omega isn't wrong, she only gets the million in cases where for some reason she doesn't (and I gave a few examples).

I think your scenario is just too narrow. Sure, if Omega is wrong, and it's not a simulation, and it's a complete one shot, then the rational decision is to then also take the 1000, but if any of these aren't true, then you better find some reason or way not to take those 1000, or you'll never see the million in the first place, or you'll them in reality, or you'll never see them in the future.

[-]Dagon4y40

Put more simply, two-boxing is the right answer in the cases where Omega is wrong.

[-]TAG4y*-10

How can you know what maximises your utility without having a sound underlying theory? ( But NOT, as I said in my other comment,a sound decision theory. You have to know that free will is real, or whether predictors are impossible. Then you might be able to have a decision theory adequate to the problem).

[-]Vladimir_Nesov4y80

Therefore, the world where the events take place is not real, it's a figment of past-Omega's imagination, predicting what you'd do. Your actions in the counterfactual just convinced past-Omega not to put the $1M in the box in the real world. Usually you don't have a way of knowing whether the world is real, or what relative weight of reality it holds. Though in this case the reality of the world was under your control.

[-]Isaac King4y10

[This comment is no longer endorsed by its author]Reply

[-]Vladimir_Nesov4y70

Consider the distinction between a low level detailed simulation of a world where you are making a decision, and high level reasoning about your decision making. How would you know which one is being applied to you, from within? If there is a way of knowing that, you can act differently in these scenarios, so that the low level simulation won't show the same outcome as the prediction made with high level reasoning. A good process of making predictions by high level reasoning won't allow there to be a difference.

The counterfactual world I'm talking about does not have to exist in any way similar to the real world, such as by being explicitly simulated. It only needs the implied existence of worldbuilding of a fictional story. The difference from a fictional story is that the story is not arbitrary, there is a precise purpose that shapes the story needed for prediction. And for a fictional character, there is no straightforward way of noticing the fictional nature of the world.

[-]Isaac King4y10

Ah, that makes sense.

[-]cousin_it4y40

I don't see the point of making the first box opaque, if you can open it and then decide whether to take the second box.

[-]Isaac King4y10

I just did that to be consistent with the traditional formulation of Newcomb's problem, it's not relevant to the story. I needed some labels for the boxes, and "box A" and "box B" are not very descriptive and make it easy for the reader to forget which is which.

[-]mike4y10

Here's a dumb question... In the version of this paradox where some agent can perfectly predict the future, why is it meaningful or useful to talk about "decisions" one might make?

[-]Valentine4y20

Because that agent is following predictable rules. It's more like working with a force of nature than with an entity we'd intuitively think of as having free will.

[-]Heighn4y10

Assuming perfect prediction by Omega:

She opens the second box, and goes home with $1,001,000. Why shouldn't she? Omega's dead.

No, she goes home with $1,000. She two-boxes, so Omega predicted she'd two-box.

"But Omega is dead, and she sees the $1,000,000."

Sure, but you can't have it both ways. You can't have a perfect predictor and have Irene disprove the prediction.

If Omega's prediction isn't perfect, but e.g. 99% accurate, two-boxing on a one-box prediction might be possible, but still isn't wise, since then Omega will almost certainly have put the $1,000,000 in the box.

[-]JBlack4y10

"Good point", says Irene. She opens the second box, and goes home with $1,001,000. Why shouldn't she? Omega's dead.

Epilogue: The next day, Irene is arrested for attempting to deposit obviously forged notes. One of the notes buried in one of the stacks was just a cartoon of Omega thinking .oO(Sigh).

Fortunately the authorities believe Irene's story of how she got the fake money. This sort of thing happens all the time, after all. And she did get to keep the real $1000.

[-]awenonian4y10

In a Newcombless problem, where you can either have $1,000 or refuse it and have $1,000,000, you could argue that the rational choice is to take the $1,000,000, and then go back for the $1,000 when people's backs were turned, but it would seem to go against the nature of the problem.

In much the same way, if Omega is a perfect predictor, there is no possible world where you receive $1,000,000 and still end up going back for the second. Either Rachel wouldn't have objected, or the argument would've taken more than 5 minutes, and the boxes disappear, or something.

I'm not sure how Omega factors in the boxes' contents in this "delayed decision" version. Like, let's say Irene is will, absent external forces, one box, and Rachel, if Irene receives $1,000,000, will threaten Irene sufficiently to take the second box, and will do nothing if Irene receives nothing. (Also they're automatons, and these are descriptions of their source code, and so no other unstated factors are able to be taken into account)

Omega simulates reality A, with the box full, sees that Irene will 2 box after threat by Rachel.

Omega simulates reality B, with the box empty, and sees that Irene will 1 box.

Omega, the perfect predictor, cannot make a consistent prediction, and, like the unstoppable force meeting the immovable object, vanishes in a puff of logic.

I think, if you want to aim at this sort of thing, the better formulation is to just claim that Omega is 90% accurate. Then there's no (immediate) logical contradiction in receiving the $1,000,000 and going back for the second box. And the payoffs should still be correct.

1 box: .9*1,000,000 + .1*0 = 900,000

2 box: .9*1,000 + .1*1,001,000 = 101,000

I expect that this formulation runs folly of what was discussed in this post around the Smoking Lesions problem, where repeated trials may let you change things you shouldn't be able to (in their example, if you chose to smoke every time, then if the correlation between smoking and lesions was held, then you can change the base rate of the lesions).

That is, I expect that if you ran repeated simulations, to try things out, then strategies like "I will one box, and iff it is full, then I will go back for the second box" will make it so Omega is incapable of predicting at the proposed 90% rate.

I think all of these things might be related to the problem of embedded agency, and people being confused (even if they don't put it in these terms) that they have an atomic free will that can think about things without affecting or being affected by the world. I'm having trouble resolving this confusion myself, because I can't figure out what Omega's prediction looks like instead of vanishing in a puff of logic. It may just be that statements like "I will turn the lever on if, and only if, I expect the lever to be off at the end" are a nonsense decision criteria. But the problem as stated doesn't seem like it should be impossible, so... I am confused.

[-]Isaac King4y10

Some clarifications on my intentions writing this story.

Omega being dead and Irene having taken the money from one box before having the conversation with Rachel are both not relevant to the core problem. I included them as a literary flourish to push people's intuitions towards thinking that Irene should open the second box, similar to what Eliezer was doing here.

Omega was wrong in this scenario, which departs from the traditional Newcomb's problem. I could have written an ending where Rachel made the same arguments and Irene still decided against doing it, but that seemed less fun. It's not relevant whether Omega was right or wrong, because after Irene has made her decision, she always has the "choice" to take the extra money and prove Omega wrong. My point here is that leaving the $1000 behind falls prey to the same "rational agents should win" problem that's usually used to justify one-boxing. After taking the $1,000,000 you can either have some elaborate justification for why it would be irrational to open the second box, or you could just... do it.

Here's another version of the story that might demonstrate this more succinctly:

Irene wakes up in her apartment one morning and finds Omega standing before her with $1,000,000 on her bedside table and a box on the floor next to it. Omega says "I predicted your behavior in Newcomb's problem and guessed that you'd take only one box, so I've expedited the process and given you the money straight away, no decision needed. There's $1000 in that box on the floor, you can throw it away in the dumpster out back. I have 346 other thought experiments to get to today, so I really need to get going."

[-]Yair Halberstadt4y20

I think you have to consider what winning means more carefully.

A rational agent doesn't buy a lottery ticket because it's a bad bet. If that ticket ends up winning, does that contradict the principle that "rational agents win"?

An Irene who acts like your model of Irene will win slightly more when omega makes an incorrect prediction (she wins the lottery), but will be given the million dollars far less commonly because Omega is almost always correct. On average she loses. And rational agents win on average.

By average I don't mean average within a particular world (repeated iteration), but on average across all possible worlds.

Updateless Decision Theory helps you model this kind of thing.

[-]Isaac King4y10

I think you have to consider what winning means more carefully.
A rational agent doesn't buy a lottery ticket because it's a bad bet. If that ticket ends up winning, does that contradict the principle that "rational agents win"?

That doesn't seem at all analogous. At the time they had the opportunity to purchase the ticket, they had no way to know it was going to win.

An Irene who acts like your model of Irene will win slightly more when omega makes an incorrect prediction (she wins the lottery), but will be given the million dollars far less commonly because Omega is almost always correct. On average she loses. And rational agents win on average.
By average I don't mean average within a particular world (repeated iteration), but on average across all possible worlds.

I agree with all of this. I'm not sure why you're bringing it up?

[-]Yair Halberstadt4y10

I'm showing why a rational agent would not take the 1000 dollars, and that doesn't contradict "rational agents win"

[-]TAG4y20

I don't see how winning can be defined without making some precise assumptions about the mechanics...How Omega's predictive abilities work, whether you have free will anyway, and so on. Consider trying to determine what the winning strategy is by writing a programme

Why would you expect one decision theory to work in any possible universe?

[-]JBlack4y10

Eliezer's alteration of the conditions very much strengthens the prisoner's dilemma. Your alterations very much weaken the original problem in both reducing the strength of evidence for Omega's hidden prediction, and in allowing a second decision after (apparently) receiving a prize.

[-]Isaac King4y10

Whether Omega ended up being right or wrong is irrelevant to the problem, since the players only find out if it was right or wrong after all decisions have been made. It has no bearing on what decision is correct at the time; only our prior probability of whether Omega will be right or wrong matters.

[-]Vladimir_Nesov4y*20

the players only find out if [Omega] was right or wrong after all decisions have been made

If you observe Omega being wrong, that's not the same thing as Omega being wrong in reality, because you might be making observations in a counterfactual. Omega is only stipulated to be a good predictor in reality, not in the counterfactuals generated by Omega's alternative decisions about what to predict. (It might be the right decision principle to expect Omega being correct in the counterfactuals generated by your decisions, even though it's not required by the problem statement either.)

[-]JBlack4y10

It is extremely relevant to the original problem. The whole point is that Omega is known to always be correct. This version weakens that premise, and the whole point of the thought experiment.

In particular, note that the second decision was based on a near-certainty that Omega was wrong. There is some ordinarily strong evidence in favour of it, since the agent is apparently in possession of a million dollars with nothing to prevent getting the thousand as well. Is that evidence strong enough to cancel out the previous evidence that Omega is always right? Who knows? There is no quantitative basis given on either side.

And that's why this thought experiment is so much weaker and less interesting than the original.

[-]Vladimir_Nesov4y20

This variant is known as Transparent Newcomb's Problem (cousin_it alluded to this in his comment). It illustrates different things, such as the need to reason so that the counterfactuals show the outcomes you want them to show because of your counterfactual behavior (or as I like to look at this, taking the possibility of being in a counterfactual seriously), and also the need to notice that Omega can be wrong in certain counterfactuals despite the stipulation of Omega always being right holding strong, with there being a question of which counterfactuals it should still be right in. Perhaps it's not useful for illustrating some things the original variant is good at illustrating, but that doesn't make it uninteresting in its own right.

Moderation Log