Outlawing Anthropics: An Updateless Dilemma

Eliezer Yudkowsky

39 Outlawing Anthropics: An Updateless Dilemma

8th Sep 2009

4 min read

39

Let us start with a (non-quantum) logical coinflip - say, look at the heretofore-unknown-to-us-personally 256th binary digit of pi, where the choice of binary digit is itself intended not to be random.

If the result of this logical coinflip is 1 (aka "heads"), we'll create 18 of you in green rooms and 2 of you in red rooms, and if the result is "tails" (0), we'll create 2 of you in green rooms and 18 of you in red rooms.

After going to sleep at the start of the experiment, you wake up in a green room.

With what degree of credence do you believe - what is your posterior probability - that the logical coin came up "heads"?

There are exactly two tenable answers that I can see, "50%" and "90%".

Suppose you reply 90%.

And suppose you also happen to be "altruistic" enough to care about what happens to all the copies of yourself. (If your current system cares about yourself and your future, but doesn't care about very similar xerox-siblings, then you will tend to self-modify to have future copies of yourself care about each other, as this maximizes your expectation of pleasant experience over future selves.)

Then I attempt to force a reflective inconsistency in your decision system, as follows:

I inform you that, after I look at the unknown binary digit of pi, I will ask all the copies of you in green rooms whether to pay $1 to every version of you in a green room and steal $3 from every version of you in a red room. If they all reply "Yes", I will do so.

(It will be understood, of course, that $1 represents 1 utilon, with actual monetary amounts rescaled as necessary to make this happen. Very little rescaling should be necessary.)

(Timeless decision agents reply as if controlling all similar decision processes, including all copies of themselves. Classical causal decision agents, to reply "Yes" as a group, will need to somehow work out that other copies of themselves reply "Yes", and then reply "Yes" themselves. We can try to help out the causal decision agents on their coordination problem by supplying rules such as "If conflicting answers are delivered, everyone loses $50". If causal decision agents can win on the problem "If everyone says 'Yes' you all get $10, if everyone says 'No' you all lose $5, if there are conflicting answers you all lose $50" then they can presumably handle this. If not, then ultimately, I decline to be responsible for the stupidity of causal decision agents.)

Suppose that you wake up in a green room. You reason, "With 90% probability, there are 18 of me in green rooms and 2 of me in red rooms; with 10% probability, there are 2 of me in green rooms and 18 of me in red rooms. Since I'm altruistic enough to at least care about my xerox-siblings, I calculate the expected utility of replying 'Yes' as (90% * ((18 * +$1) + (2 * -$3))) + (10% * ((18 * -$3) + (2 * +$1))) = +$5.60." You reply yes.

However, before the experiment, you calculate the general utility of the conditional strategy "Reply 'Yes' to the question if you wake up in a green room" as (50% * ((18 * +$1) + (2 * -$3))) + (50% * ((18 * -$3) + (2 * +$1))) = -$20. You want your future selves to reply 'No' under these conditions.

This is a dynamic inconsistency - different answers at different times - which argues that decision systems which update on anthropic evidence will self-modify not to update probabilities on anthropic evidence.

I originally thought, on first formulating this problem, that it had to do with double-counting the utilons gained by your variable numbers of green friends, and the probability of being one of your green friends.

However, the problem also works if we care about paperclips. No selfishness, no altruism, just paperclips.

Let the dilemma be, "I will ask all people who wake up in green rooms if they are willing to take the bet 'Create 1 paperclip if the logical coinflip came up heads, destroy 3 paperclips if the logical coinflip came up tails'. (Should they disagree on their answers, I will destroy 5 paperclips.)" Then a paperclip maximizer, before the experiment, wants the paperclip maximizers who wake up in green rooms to refuse the bet. But a conscious paperclip maximizer who updates on anthropic evidence, who wakes up in a green room, will want to take the bet, with expected utility ((90% * +1 paperclip) + (10% * -3 paperclips)) = +0.6 paperclips.

This argues that, in general, decision systems - whether they start out selfish, or start out caring about paperclips - will not want their future versions to update on anthropic "evidence".

Well, that's not too disturbing, is it? I mean, the whole anthropic thing seemed very confused to begin with - full of notions about "consciousness" and "reality" and "identity" and "reference classes" and other poorly defined terms. Just throw out anthropic reasoning, and you won't have to bother.

When I explained this problem to Marcello, he said, "Well, we don't want to build conscious AIs, so of course we don't want them to use anthropic reasoning", which is a fascinating sort of reply. And I responded, "But when you have a problem this confusing, and you find yourself wanting to build an AI that just doesn't use anthropic reasoning to begin with, maybe that implies that the correct resolution involves us not using anthropic reasoning either."

So we can just throw out anthropic reasoning, and relax, and conclude that we are Boltzmann brains. QED.

In general, I find the sort of argument given here - that a certain type of decision system is not reflectively consistent - to be pretty damned compelling. But I also find the Boltzmann conclusion to be, ahem, more than ordinarily unpalatable.

In personal conversation, Nick Bostrom suggested that a division-of-responsibility principle might cancel out the anthropic update - i.e., the paperclip maximizer would have to reason, "If the logical coin came up heads then I am 1/18th responsible for adding +1 paperclip, if the logical coin came up tails then I am 1/2 responsible for destroying 3 paperclips." I confess that my initial reaction to this suggestion was "Ewwww", but I'm not exactly comfortable concluding I'm a Boltzmann brain, either.

EDIT: On further reflection, I also wouldn't want to build an AI that concluded it was a Boltzmann brain! Is there a form of inference which rejects this conclusion without relying on any reasoning about subjectivity?

EDIT2: Psy-Kosh has converted this into a non-anthropic problem!

Anthropics

Personal Blog

39

New Comment

Rendering 0/210 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 1:28 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

39 Outlawing Anthropics: An Updateless Dilemma

by Eliezer Yudkowsky

8th Sep 2009

4 min read

210

39

After going to sleep at the start of the experiment, you wake up in a green room.

With what degree of credence do you believe - what is your posterior probability - that the logical coin came up "heads"?

There are exactly two tenable answers that I can see, "50%" and "90%".

Suppose you reply 90%.

Then I attempt to force a reflective inconsistency in your decision system, as follows:

(It will be understood, of course, that $1 represents 1 utilon, with actual monetary amounts rescaled as necessary to make this happen. Very little rescaling should be necessary.)

However, the problem also works if we care about paperclips. No selfishness, no altruism, just paperclips.

This argues that, in general, decision systems - whether they start out selfish, or start out caring about paperclips - will not want their future versions to update on anthropic "evidence".

So we can just throw out anthropic reasoning, and relax, and conclude that we are Boltzmann brains. QED.

EDIT2: Psy-Kosh has converted this into a non-anthropic problem!

Anthropics

Personal Blog

39

Mentioned in

58Anthropical Paradoxes are Paradoxes of Probability Theory

35Reflective consistency, randomized decisions, and the dangers of unrealistic thought experiments

27SUDT: A toy decision theory for updateless anthropics

17SIA > SSA, part 3: An aside on betting in anthropics

16Quantum Immortality: A Perspective if AI Doomers are Probably Right

Load More (5/14)

New Comment

Rendering 0/210 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 1:28 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

More from Eliezer Yudkowsky

Curated and popular this week

210Comments

210

Comment Permalink

Joanna Morningstar17y50

I've been watching for a while, but have never commented, so this may be horribly flawed, opaque or otherwise unhelpful.

I think the problem is entirely caused by the use of the wrong sets of belief, and that anything holding to Eliezer's 1-line summary of TDT or alternatively UDT should get this right.

Suppose that you're a rational agent. Since you are instantiated in multiple identical circumstances (green rooms) and asked identical questions, your answers should be identical. Hence if you wake up in a green room and you're asked to steal from the red rooms and give to the green rooms, you either commit a group of 2 of you to a loss of 52 or commit a group of 18 of you to a gain of 12.

This committal is what you wish to optimise over from TDT/UDT, and clearly this requires knowledge about the likelyhood of different decision making groups. The distribution of sizes of random groups is not the same as the distribution of sizes of groups that a random individual is in. The probabilities of being in a group are upweighted by the size of the group and normalised. This is why Bostrom's suggested 1/n split of responsibility works; it reverses the belief about where a random individual is in a set of decision making groups to a belief about the size of a random decision making group.

By the construction of the problem the probability that a random (group of all the people in green rooms) has size 18 is 0.5, and similarly for 2 the probability is 0.5. Hence the expected utility is (0.512)+(0.5-52)=-20.

If you're asked to accept a bet on there being 18 people in green rooms, and you're told that only you're being offered it, then the decision commits exactly one instance of you to a specific loss or gain, regardless of the group you're in. Hence you can't do better than the 0.9 and 0.1 beliefs.

If you're told that the bet is being offered to everyone in a green room, then you are committing to n times the outcome in any group of n people. In this case gains are conditional on group size, and so you have to use the 0.5-0.5 belief about the distribution of groups. It doesn't matter because the larger groups have the larger multiplier and thus shutting up and multiplying yields the same answers as a single-shot bet.

ETA: At some level this is just choosing an optimal output for your calculation of what to do, given that the result is used variably widely.

CarlShulman17y00

"Hence if you wake up in a green room and you're asked to steal from the red rooms and give to the green rooms, you either commit a group of 2 of you to a loss of 52 or commit a group of 18 of you to a gain of 12."

In the example you care equally about the red room and green room dwellers.

1Christian_Szegedy17y

I was influenced by the OP and used to think that way. However I think now, that this is not the root problem. What if the agents get more complicated decision problems: for example, rewards depending on the parity of the agents voting certain way, etc.? I think, what essential is that the agents have to think globally (categorical imperative, hmmm?) Practically: if the agent recognizes that there is a collective decision, then it should model all available conceivable protocols (but making apriori sure that all cooperating agents perform the same or compatible analysis, if they can't communicate) and then they should choose the protocol with best overall total gain. In the case of the OP: the second calculation in the OP. (Not messing around with correction factors based on responsibilities, etc.) Special considerations based on group sizes etc. may be incidentally correct in certain situations, but this is just not general enough. The crux is that the ultimate test is simply the expected value computation for the protocol of the whole group.

See in context