Outlawing Anthropics: An Updateless Dilemma

Eliezer Yudkowsky

39 Outlawing Anthropics: An Updateless Dilemma

8th Sep 2009

4 min read

39

Let us start with a (non-quantum) logical coinflip - say, look at the heretofore-unknown-to-us-personally 256th binary digit of pi, where the choice of binary digit is itself intended not to be random.

If the result of this logical coinflip is 1 (aka "heads"), we'll create 18 of you in green rooms and 2 of you in red rooms, and if the result is "tails" (0), we'll create 2 of you in green rooms and 18 of you in red rooms.

After going to sleep at the start of the experiment, you wake up in a green room.

With what degree of credence do you believe - what is your posterior probability - that the logical coin came up "heads"?

There are exactly two tenable answers that I can see, "50%" and "90%".

Suppose you reply 90%.

And suppose you also happen to be "altruistic" enough to care about what happens to all the copies of yourself. (If your current system cares about yourself and your future, but doesn't care about very similar xerox-siblings, then you will tend to self-modify to have future copies of yourself care about each other, as this maximizes your expectation of pleasant experience over future selves.)

Then I attempt to force a reflective inconsistency in your decision system, as follows:

I inform you that, after I look at the unknown binary digit of pi, I will ask all the copies of you in green rooms whether to pay $1 to every version of you in a green room and steal $3 from every version of you in a red room. If they all reply "Yes", I will do so.

(It will be understood, of course, that $1 represents 1 utilon, with actual monetary amounts rescaled as necessary to make this happen. Very little rescaling should be necessary.)

(Timeless decision agents reply as if controlling all similar decision processes, including all copies of themselves. Classical causal decision agents, to reply "Yes" as a group, will need to somehow work out that other copies of themselves reply "Yes", and then reply "Yes" themselves. We can try to help out the causal decision agents on their coordination problem by supplying rules such as "If conflicting answers are delivered, everyone loses $50". If causal decision agents can win on the problem "If everyone says 'Yes' you all get $10, if everyone says 'No' you all lose $5, if there are conflicting answers you all lose $50" then they can presumably handle this. If not, then ultimately, I decline to be responsible for the stupidity of causal decision agents.)

Suppose that you wake up in a green room. You reason, "With 90% probability, there are 18 of me in green rooms and 2 of me in red rooms; with 10% probability, there are 2 of me in green rooms and 18 of me in red rooms. Since I'm altruistic enough to at least care about my xerox-siblings, I calculate the expected utility of replying 'Yes' as (90% * ((18 * +$1) + (2 * -$3))) + (10% * ((18 * -$3) + (2 * +$1))) = +$5.60." You reply yes.

However, before the experiment, you calculate the general utility of the conditional strategy "Reply 'Yes' to the question if you wake up in a green room" as (50% * ((18 * +$1) + (2 * -$3))) + (50% * ((18 * -$3) + (2 * +$1))) = -$20. You want your future selves to reply 'No' under these conditions.

This is a dynamic inconsistency - different answers at different times - which argues that decision systems which update on anthropic evidence will self-modify not to update probabilities on anthropic evidence.

I originally thought, on first formulating this problem, that it had to do with double-counting the utilons gained by your variable numbers of green friends, and the probability of being one of your green friends.

However, the problem also works if we care about paperclips. No selfishness, no altruism, just paperclips.

Let the dilemma be, "I will ask all people who wake up in green rooms if they are willing to take the bet 'Create 1 paperclip if the logical coinflip came up heads, destroy 3 paperclips if the logical coinflip came up tails'. (Should they disagree on their answers, I will destroy 5 paperclips.)" Then a paperclip maximizer, before the experiment, wants the paperclip maximizers who wake up in green rooms to refuse the bet. But a conscious paperclip maximizer who updates on anthropic evidence, who wakes up in a green room, will want to take the bet, with expected utility ((90% * +1 paperclip) + (10% * -3 paperclips)) = +0.6 paperclips.

This argues that, in general, decision systems - whether they start out selfish, or start out caring about paperclips - will not want their future versions to update on anthropic "evidence".

Well, that's not too disturbing, is it? I mean, the whole anthropic thing seemed very confused to begin with - full of notions about "consciousness" and "reality" and "identity" and "reference classes" and other poorly defined terms. Just throw out anthropic reasoning, and you won't have to bother.

When I explained this problem to Marcello, he said, "Well, we don't want to build conscious AIs, so of course we don't want them to use anthropic reasoning", which is a fascinating sort of reply. And I responded, "But when you have a problem this confusing, and you find yourself wanting to build an AI that just doesn't use anthropic reasoning to begin with, maybe that implies that the correct resolution involves us not using anthropic reasoning either."

So we can just throw out anthropic reasoning, and relax, and conclude that we are Boltzmann brains. QED.

In general, I find the sort of argument given here - that a certain type of decision system is not reflectively consistent - to be pretty damned compelling. But I also find the Boltzmann conclusion to be, ahem, more than ordinarily unpalatable.

In personal conversation, Nick Bostrom suggested that a division-of-responsibility principle might cancel out the anthropic update - i.e., the paperclip maximizer would have to reason, "If the logical coin came up heads then I am 1/18th responsible for adding +1 paperclip, if the logical coin came up tails then I am 1/2 responsible for destroying 3 paperclips." I confess that my initial reaction to this suggestion was "Ewwww", but I'm not exactly comfortable concluding I'm a Boltzmann brain, either.

EDIT: On further reflection, I also wouldn't want to build an AI that concluded it was a Boltzmann brain! Is there a form of inference which rejects this conclusion without relying on any reasoning about subjectivity?

EDIT2: Psy-Kosh has converted this into a non-anthropic problem!

Anthropics

Personal Blog

39

New Comment

Rendering 0/210 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 12:15 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

39 Outlawing Anthropics: An Updateless Dilemma

by Eliezer Yudkowsky

8th Sep 2009

4 min read

210

39

After going to sleep at the start of the experiment, you wake up in a green room.

With what degree of credence do you believe - what is your posterior probability - that the logical coin came up "heads"?

There are exactly two tenable answers that I can see, "50%" and "90%".

Suppose you reply 90%.

Then I attempt to force a reflective inconsistency in your decision system, as follows:

(It will be understood, of course, that $1 represents 1 utilon, with actual monetary amounts rescaled as necessary to make this happen. Very little rescaling should be necessary.)

However, the problem also works if we care about paperclips. No selfishness, no altruism, just paperclips.

This argues that, in general, decision systems - whether they start out selfish, or start out caring about paperclips - will not want their future versions to update on anthropic "evidence".

So we can just throw out anthropic reasoning, and relax, and conclude that we are Boltzmann brains. QED.

EDIT2: Psy-Kosh has converted this into a non-anthropic problem!

Anthropics

Personal Blog

39

Mentioned in

58Anthropical Paradoxes are Paradoxes of Probability Theory

35Reflective consistency, randomized decisions, and the dangers of unrealistic thought experiments

27SUDT: A toy decision theory for updateless anthropics

17SIA > SSA, part 3: An aside on betting in anthropics

16Quantum Immortality: A Perspective if AI Doomers are Probably Right

Load More (5/14)

New Comment

Rendering 0/210 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 12:15 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

More from Eliezer Yudkowsky

Curated and popular this week

210Comments

210

Comment Permalink

nshepperd14y20

Huh. Reading this again, together with byrnema's pointer discussion and Psy-Kosh's non-anthropic reformulation...

It seems like the problem is that whether each person gets to make a decision depends on the evidence they think they have, in such a way to make that evidence meaningless. To construct an extreme example: The Antecedent Mugger gathers a billion people in a room together, and says:

"I challenge you to a game of wits! In this jar is a variable amount of coins, between $0 and $10,000. I will allow each of you to weigh the jar using this set of extremely imprecise scales. Then I will ask each of you whether to accept my offer: to as a group buy the jar off me for $5000, the money to be distributed equally among you. Note: although I will ask all of you, the only response I will consider is the one given by the person with the greatest subjective expected utility from saying 'yes'."

In this case, even if the jar always contains $0, there will always be someone who receives enough information from the scales to think the jar contains >$5000 with high probability, and therefore to say yes. Since that person's response is the one that is taken for the whole group, the group always pays out $5000, resulting in a money pump in favour of the Mugger.

The problem is that, from an outside perspective, the observations of the one who gets to make the choice are almost completely uncorrelated from the actual contents of the jar, due to the Mugger's selection process. For any general strategy Observations → Response, the Mugger can always summon enough people to find someone who has seen the observations that will produce the response he wants, unless the strategy is a constant function.

Similarly, in the problem with the marbles, only the people with the observation Green get any influence, so the observations of "people who get to make a decision" are uncorrelated with the actual contents of the buckets (even though observations of the participants in general are correlated with the buckets).

Kindly14y20

The problem here is that your billion people are for some reason giving the answer most likely to be correct rather than the answer most likely to actually be profitable. If they were a little more savvy, they could reason as follows:

"The scales tell me that there's $6000 worth of coins in the jar, so it seems like a good idea to buy the jar. However, if I did not receive the largest weight estimate from the scales, my decision is irrelevant; and if I did receive the largest weight estimate, then conditioned on that it seems overwhelmingly likely that there are many fewer coins in the jar than I'd think based on that estimate -- and in that case, I ought to say no."

See in context