Comment Permalink

It's not clear that reflective consistency is feasible for human beings.

Consider the following thought experiment. You’re about to be copied either once (with probability .99) or twice (with probability .01). After that, one of your two or three instances will be randomly selected to be the decision-maker. He will get to choose from the following options, without knowing how many copies were made:

A: The decision-maker will have a pleasant experience. The other(s) will have unpleasant experience(s).

B: The decision-maker will have an unpleasant experience. The other(s) will have pleasant experience(s).

Presumably, you’d like to commit your future self to pick option B. But without some sort of external commitment device, it’s hard to see how you can prevent your future self from picking option A.

cousin_it16y00

Why so complicated? Just split into two selves and play Prisoner's Dilemma with each other. A philosophically-inclined person could have major fun with this experiment, e.g. inventing some sort of Agentless Decision Theory, while mathematically-inclined people enjoy the show from a safe distance.

See in context

7 Newcomb's Problem standard positions

by Eliezer Yudkowsky

6th Apr 2009

1 min read

7

Marion Ledwig's dissertation summarizes much of the existing thinking that's gone into Newcomb's Problem.

(For the record, I myself am neither an evidential decision theorist, nor a causal decision theorist in the current sense. My view is not easily summarized, but it is reflectively consistent without need of precommitment or similar dodges; my agents see no need to modify their own source code or invoke abnormal decision procedures on Newcomblike problems.)

Causal Decision TheoryEvidential Decision TheoryNewcomb's Problem

Personal Blog

7

New Comment

22 comments, sorted by

top scoring

Click to highlight new comments since: Today at 9:30 PM

[-]Vladimir_Nesov16y60

Why is your view not easily summarized? From what I see, the solution satisfying all of the requirements looks rather simple, without even any need to define causality and the like. I may write it up at some point in the following months, after some running confusions (not crucial to the main point) are resolved.

Basically, all the local decisions come from the same computation that would be performed to set the most general precommitment for all possible states of the world. The expected utility maximization is defined only once, on the global state space, and then the actual actions only retrieve the global solution, given encountered observations. The observations don't change the state space over which the expected utility optimization is defined (and don't change the optimal global solution or preference order on the global solutions), only what the decisions in a given (counterfactual) branch can affect. Since the global precommitment is the only thing that defines the local agents' decisions, the "commitment" part can be dropped, and the agents' actions can just be defined to follow the resulting preference order.

I admit, it'd take some work to write that up understandably, but it doesn't seem to involve difficult technical issues.

[-]Wei Dai16y20

I think your summary is understandable enough, but I don't agree that observations should never change the optimal global solution or preference order on the global solutions, because observations can tell you which observer you are in the world, and different observers can have different utility functions. See my counter-example in a separate comment at http://lesswrong.com/lw/90/newcombs_problem_standard_positions/5u4#comments.

[-]Vladimir_Nesov16y00

From the global point of view, you only consider different possible experiences, that imply different possible situations. Nothing changes, because everything is determined from the global viewpoint. If you want to determine certain decisions in response to certain possible observations, you also specify that globally, and set it in stone. Whatever happens to you, you can (mathematically speaking) consider that in advance, as an input sequence to your cognitive algorithm, and prepare the plan of action in response. The fact that you participate in a certain mind-copying experiment is also data to which you respond in a certain way.

This is of course not for human beings, this is for something holding much stronger to reflective consistency. And in that setting changing preferences is unacceptable.

[-]cousin_it16y50

Due to my math background, the thesis read like total gibberish. Tons and tons of not even wrong, like the philosophical tomes written on the unexpected hanging paradox before the logical contradiction due to self-reference was pointed out.

But one passage stood out as meaningful:

the predictor just has to be a little bit better than chance for Newcomb's problem to arise... One doesn't need a good psychologist for that. A friend who knows the decision maker well is enough.

This passage is instructively wrong. To screw with such an Omega, just ask a different friend who knows you equally well, take their judgement and do the reverse. (Case 3 "Terminating Omega" in my post.) This indicates the possibility that the problem statement may be a self-contradictory lie, just like the setup of the unexpected hanging paradox. Of course, the amount of computation needed to bring out the contradiction depends on how much mystical power you award to Omega.

I apologize for getting on my horse here. This discussion should come to an end somehow.

[-]dclayh16y30

This passage is instructively wrong. To screw with such an Omega, just ask a different friend who knows you equally well, take their judgement and do the reverse.

I think this reply is also illuminating: the stated goal in Newcomb's problem is to maximize your financial return. If your goal is make Omega have predicted wrongly, you are solving a different problem.

I do agree that the problem may be subtly self-contradictory. Could you point me to your preferred writeup of the Unexpected Hanging Paradox?

[-]cousin_it16y40

Uh, Omega has no business deciding what problem I'm solving.

Could you point me to your preferred writeup of the Unexpected Hanging Paradox?

The solution I consider definitively correct is outlined on the Wikipedia page, but simple enough to be expressed here. The judge actually says "you can't deduce the day you'll be hanged, even if you use this statement as an axiom too". This phrase is self-referential, like the phrase "this statement is false". Although not all self-referential statements are self-contradictory, this one turns out to be. The proof of self-contradiction simply follows the prisoner's reasoning. This line of attack seems to have been first rigorously formalized by Fitch, "A Goedelized formulation of the prediction paradox", can't find the full text online. And that's all there is to it.

[-]dclayh16y10

Uh, Omega has no business deciding what problem I'm solving.

No, but if you're solving something other than Newcomb's problem, why discuss it on this post?

[-]cousin_it16y30

I'm not solving it in the sense of utility maximization. I'm solving it in the sense of demonstrating that the input conditions might well be self-contradictory, using any means available.

[-]dclayh16y20

Okay yes, I see what you're trying to do and the comment is retracted.

[-]whpearson16y10

Maximising your financial return entails that you make omega's prediction wrong, if you can get it to predict that you one box when you actually two box, you maximise your financial return.

[-]Eliezer Yudkowsky16y50

Well, it had better not be predictable that you're going to try that. I mean, at the point where Omega realizes, "Hey, this guy is going to try an elaborate clever strategy to get me to fill box B and then two-box" It's pretty much got you pegged.

[-]Paul Crowley16y10

That's not so - the "elaborate clever strategy" does include a chance that you'll one-box. What does the payoff matrix look like from Omega's side?

[-]whpearson16y10

I never said it was easy thing to do. I just meant that situation is the maximum if it is reachable. Which depends upon the implementation of Omega in the real world.

[-]dclayh16y30

My point is merely that getting Omega to predict wrong is easy (flip a coin). Getting an expectation value higher than $1 million is what's hard (and likely impossible, if Omega is much smarter than you, as Eliezer says above).

[-]Nick_Tarleton16y10

To screw with such an Omega, just ask a different friend who knows you equally well, take their judgement and do the reverse.

I believe this can be made consistent. Your first friend will predict that you will ask your second friend. Your second friend will predict that you will do the opposite of whatever they say, and so won't be able to predict anything. If you ever choose, you'll have to fall back on something consistent, which your first friend will consequently predict.

If you force 2F to make some arbitrary prediction, though, then if 1F can predict 2F's prediction, 1F will predict you'll do the opposite. If 1F can't do that, he'll do whatever he would do if you used a quantum randomizer (I believe this is usually said to be not putting anything in the box).

[-]cousin_it16y20

You have escalated the mystical power of Omega - surely it's no longer just a human friend who knows you well - supporting my point about the quoted passage. If your new Omegas aren't yet running full simulations (a case resolved by indexical uncertainty) but rather some kind of coarse-grained approximations, then I should have enough sub-pixel and off-scene freedom to condition my action on 2F's response with neither 1F nor 2F knowing it. If you have some other mechanism of how Omega might work, please elaborate: I need to understand an Omega to screw it up.

[-][anonymous]16y10

To determine exactly how to screw with your Omega, I need to understand what it does. If it's running something less than a full simulation, something coarse-grained, I can exploit it: condition on a sub-pixel or off-scene detail. (The full simulation scenario is solved by indexical uncertainty.) In the epic thread no one has yet produced a demystified Omega that can't be screwed with. Taboo "predict" and explain.

[-]jslocum13y30

I've devised some additional scenarios that I have found to be helpful in contemplating this problem.

Scenario 1: Omega proposes Newcomb's problem to you. However, there is a twist: before he scans you, you may choose on of two robots to perform the box opening for you. Robot A will only open the $1M box; robot B will open both.

Scenario 2: You wake up and suddenly find yourself in a locked room with two boxes, and a note from Omega: "I've scanned a hapless citizen (not you). predicted their course of action, and placed the appropriate amount of money in the two boxes present. Choose one or two and then you may go"

In scenario 1, both evidential and causal decision theories agree that you should one-box. In scenario 2, they both agree that you should two-box. Now, if we replace the robots with your future self and the hapless citizen with your past self, S1 becomes "what should you do prior to being scanned by Omega" and S2 reverts to the original problem. So now, omitting the possibility of fooling Omega as negligible, it can be seen that maximizing the payout from Newcomb's problem is really about finding a way to cause your future self to one-box.

What options are available, to either rational agents or humans, for exerting causal power on their future selves? A human might make a promise to themselves (interesting question: is a promise a precommitment or a self-modification?), ask another person (or other agent) to provide disincentives for two-boxing (e.g. "Hey, Bob, I bet you I'll one-box. If I win, I get $1; if you win, you get $1M), or find some way of modifying the environment to prevent their future self from two-boxing (e.g. drop the second box down a well). A general rational agent has similar options: modify itself into something that will one-box, and/or modify the environment so that one-boxing is the best course of action for its future self.

So now we have two solutions, but can we do better? If rational agent 'Alpha' doesn't want to rely on external mechanisms to coerce it's future's behavior, and also does not want to introduce a hack into its source code, what general solution can it adopt that solves this general class of problem? I have not yet read the Timeless Decision Theory paper; I think I'll ponder this question before doing so, and see if I encounter any interesting thoughts.

[-]Wei Dai16y10

It's not clear that reflective consistency is feasible for human beings.

A: The decision-maker will have a pleasant experience. The other(s) will have unpleasant experience(s).

B: The decision-maker will have an unpleasant experience. The other(s) will have pleasant experience(s).

[-]cousin_it16y00

[-]Wei Dai16y20

I structured my thought experiment that way specifically to avoid superrationality-type justifications for playing Cooperate in PD.

[+]BenRayfield14y-90

Moderation Log