Let us start with a (non-quantum) logical coinflip - say, look at the heretofore-unknown-to-us-personally 256th binary digit of pi, where the choice of binary digit is itself intended not to be random.
If the result of this logical coinflip is 1 (aka "heads"), we'll create 18 of you in green rooms and 2 of you in red rooms, and if the result is "tails" (0), we'll create 2 of you in green rooms and 18 of you in red rooms.
After going to sleep at the start of the experiment, you wake up in a green room.
With what degree of credence do you believe - what is your posterior probability - that the logical coin came up "heads"?
There are exactly two tenable answers that I can see, "50%" and "90%".
Suppose you reply 90%.
And suppose you also happen to be "altruistic" enough to care about what happens to all the copies of yourself. (If your current system cares about yourself and your future, but doesn't care about very similar xerox-siblings, then you will tend to self-modify to have future copies of yourself care about each other, as this maximizes your expectation of pleasant experience over future selves.)
Then I attempt to force a reflective inconsistency in your decision system, as follows:
I inform you that, after I look at the unknown binary digit of pi, I will ask all the copies of you in green rooms whether to pay $1 to every version of you in a green room and steal $3 from every version of you in a red room. If they all reply "Yes", I will do so.
(It will be understood, of course, that $1 represents 1 utilon, with actual monetary amounts rescaled as necessary to make this happen. Very little rescaling should be necessary.)
(Timeless decision agents reply as if controlling all similar decision processes, including all copies of themselves. Classical causal decision agents, to reply "Yes" as a group, will need to somehow work out that other copies of themselves reply "Yes", and then reply "Yes" themselves. We can try to help out the causal decision agents on their coordination problem by supplying rules such as "If conflicting answers are delivered, everyone loses $50". If causal decision agents can win on the problem "If everyone says 'Yes' you all get $10, if everyone says 'No' you all lose $5, if there are conflicting answers you all lose $50" then they can presumably handle this. If not, then ultimately, I decline to be responsible for the stupidity of causal decision agents.)
Suppose that you wake up in a green room. You reason, "With 90% probability, there are 18 of me in green rooms and 2 of me in red rooms; with 10% probability, there are 2 of me in green rooms and 18 of me in red rooms. Since I'm altruistic enough to at least care about my xerox-siblings, I calculate the expected utility of replying 'Yes' as (90% * ((18 * +$1) + (2 * -$3))) + (10% * ((18 * -$3) + (2 * +$1))) = +$5.60." You reply yes.
However, before the experiment, you calculate the general utility of the conditional strategy "Reply 'Yes' to the question if you wake up in a green room" as (50% * ((18 * +$1) + (2 * -$3))) + (50% * ((18 * -$3) + (2 * +$1))) = -$20. You want your future selves to reply 'No' under these conditions.
This is a dynamic inconsistency - different answers at different times - which argues that decision systems which update on anthropic evidence will self-modify not to update probabilities on anthropic evidence.
I originally thought, on first formulating this problem, that it had to do with double-counting the utilons gained by your variable numbers of green friends, and the probability of being one of your green friends.
However, the problem also works if we care about paperclips. No selfishness, no altruism, just paperclips.
Let the dilemma be, "I will ask all people who wake up in green rooms if they are willing to take the bet 'Create 1 paperclip if the logical coinflip came up heads, destroy 3 paperclips if the logical coinflip came up tails'. (Should they disagree on their answers, I will destroy 5 paperclips.)" Then a paperclip maximizer, before the experiment, wants the paperclip maximizers who wake up in green rooms to refuse the bet. But a conscious paperclip maximizer who updates on anthropic evidence, who wakes up in a green room, will want to take the bet, with expected utility ((90% * +1 paperclip) + (10% * -3 paperclips)) = +0.6 paperclips.
This argues that, in general, decision systems - whether they start out selfish, or start out caring about paperclips - will not want their future versions to update on anthropic "evidence".
Well, that's not too disturbing, is it? I mean, the whole anthropic thing seemed very confused to begin with - full of notions about "consciousness" and "reality" and "identity" and "reference classes" and other poorly defined terms. Just throw out anthropic reasoning, and you won't have to bother.
When I explained this problem to Marcello, he said, "Well, we don't want to build conscious AIs, so of course we don't want them to use anthropic reasoning", which is a fascinating sort of reply. And I responded, "But when you have a problem this confusing, and you find yourself wanting to build an AI that just doesn't use anthropic reasoning to begin with, maybe that implies that the correct resolution involves us not using anthropic reasoning either."
So we can just throw out anthropic reasoning, and relax, and conclude that we are Boltzmann brains. QED.
In general, I find the sort of argument given here - that a certain type of decision system is not reflectively consistent - to be pretty damned compelling. But I also find the Boltzmann conclusion to be, ahem, more than ordinarily unpalatable.
In personal conversation, Nick Bostrom suggested that a division-of-responsibility principle might cancel out the anthropic update - i.e., the paperclip maximizer would have to reason, "If the logical coin came up heads then I am 1/18th responsible for adding +1 paperclip, if the logical coin came up tails then I am 1/2 responsible for destroying 3 paperclips." I confess that my initial reaction to this suggestion was "Ewwww", but I'm not exactly comfortable concluding I'm a Boltzmann brain, either.
EDIT: On further reflection, I also wouldn't want to build an AI that concluded it was a Boltzmann brain! Is there a form of inference which rejects this conclusion without relying on any reasoning about subjectivity?
EDIT2: Psy-Kosh has converted this into a non-anthropic problem!
Thinking about how all the green-room people come to the wrong conclusion makes my brain hurt. But I suppose, finally, it is true. They cannot base their decision on their subjective experience, and here I'll outline some thoughts I've had as to under what conditions they should know they cannot do so.
Suppose there are 20 people (Amy, Benny, Carrie, Donny, ...) and this experiment is done as described. If we always ask Tony (the 20th person) whether or not to say "yes", and he bases his decision on whether or not he is in a green room, then the expected value of his decision really is $5.6. Tony here is a special, singled out "decider". One way of looking at this situation is that the 'yes' depends on some information in the system (that is, whether or not Tony was in a green room.)
If instead we say that the decider can be anyone, and in fact we choose the decider after the assortment into rooms as someone in a green room, then we are not really given any information about the system.
It is the difference between (a) picking a person, and seeing if they wake up in a green room, and (b) picking a person that is in a green room. (I know you are well aware of this difference, but it helps to spell it out.)
You can't pick the deciders from a set with a prespecified outcome. It's a pointer problem: You can learn about the system from the change of state from Tony to Tony (Tony: no room -->Tony: green room), but you can't assign* the star after the assignment (pick someone in a green room and ask them).
When a person wakes in a green room and is asked, they should say 'yes' if they are randomly chosen to be asked independently of their room color. If they were chosen after the assignment, because they awoke in a green room, they should recognize this as the “unfixed pointer problem” (a special kind of selection bias).
Avoiding the pointer problem is straight-forward. The people who wake in red rooms have a posterior probability of heads as 10%. The people who wake in green rooms have a posterior probability of heads as 90%. Your posterior probability is meaningful only if your posterior probability could have been either way. Since Eliezer only asks people who woke in green rooms, and never asks people who woke in red rooms, the posterior probabilities are not meaningful.
The rest of your reply makes sense to me, but can I ask you to amplify on this? Maybe I'm being naive, but to me, a 90% probability is a 90% proba... (read more)