Updateless anthropics

8 Stuart_Armstrong 20 February 2011 07:23PM

Three weeks ago, I set out to find a new theory of anthropics, to try and set decision theory on a firm footing with respect to copying, deleting copies, merging them, correlated decisions, and the presence or absence of extra observers. I've since come full circle, and realised that UDT already has a built-in anthropic theory, that resolves a lot of the problems that had been confusing me.

The theory is simple, and is essentially a rephrasing of UDT: if you are facing a decision X, and trying to figure out the utility of X=a for some action a, then calculate the full expected utility of X being a, given the objective probabilities of each world (including those in which you don't exist).

As usual, you have to consider the consequences of X=a for all agents who will make the same decision as you, whether they be exact copies, enemies, simulations or similar-minded people. However, your utility will have to do more work that is usually realised: notions such as selfishness or altruism with respect to your copies have to be encoded in the utility function, and will result in substantially different behaviour.

The rest of the post is a series of cases-studies illustrating this theory. Utility is assumed to be linear in cash for convenience.

Sleeping with the Presumptuous Philosopher 

The first test case is the Sleeping Beauty problem.


 In its simplest form, this involves a coin toss; if it comes out heads, one copy of Sleeping Beauty is created. If it comes out tails, two copies are created. Then the copies are asked at what odds they would be prepared to bet that the coin came out tails. You can assume either that the different copies care for each other in the manner I detailed here, or more simply that all winnings will be kept by a future merged copy (or an approved charity). Then the algorithm is simple: the two worlds have equal probability. Let X be the decision where sleeping beauty decides between a contract that pays out $1 if the coin is heads, versus one that pays out $1 if the coin is tails. If X="heads" (to use an obvious shorthand), then Sleeping Beauty will expect to make $1*0.5, as she is offered the contract once. If X="tails", then the total return of that decision is $1*2*0.5, as copies of her will be offered the contract twice, and they will all make the same decision. So Sleeping Beauty will follow the SIA 2:1 betting odds of tails over heads.

Variants such as "extreme Sleeping Beauty" (where thousands of copies are created on tails) will behave in the same way; if it feels counter-intuitive to bet at thousands-to-one odds that a fair coin landed tails, it's the fault of expected utility itself, as the rewards of being right dwarf the costs of being wrong.

But now let's turn to the Presumptuous Philosopher, a thought experiment that is often confused with Sleeping Beauty. Here we have exactly the same setup as "extreme Sleeping Beauty", but the agents (the Presumptuous philosophers) are mutually selfish. Here the return to X="heads" remains $1*0.5. However the return to X="tails" is also $1*0.5, since even if all the Presumptuous Philosophers in the "tails" universe bet on "tails", each one will still only get $1 in utility. So the Presumptuous Philosopher should only take even SSA betting 1:1 odds on the result of the coin flip.

So SB is acts like she follows the self-indication assumption, (SIA), and while the PP is following the self-sampling assumption (SSA). This remains true if we change the setup so that one agent is given a betting opportunity in the tails universe. Then the objective probability of any one agent being asked is low, so both SB and PP model the "objective probability" of the tails world, given that they have been asked to bet, as being low. However, SB gains utility if any of her copies is asked to bet and receives a profit, so the strategy "if I'm offered $1 if I guess correctly whether the coin is heads or tails, I will say tails" gets her $1*0.5 utility whether or not she is the specific one who is asked. Betting heads nets her the same result, so SB will give SIA 1:1 odds in this case.

On the other hand, the PP will only gain utility in the very specific world where he himself is asked to bet. So his gain from the updateless "if I'm offered $1 if I guess correctly whether the coin is heads or tails, I will say tails" is tiny, as he's unlikely to be asked to bet. Hence he will offer the SSA odds that make heads a much more "likely" proposition.

The Doomsday argument

Now, using SSA odds brings us back into the realm of the classical Doomsday argument. How is it that Sleeping Beauty is immune to the Doomsday argument while the Presumptuous Philosopher is not? Which one is right; is the world really about to end?

Asking about probabilities independently of decisions is meaningless here; instead, we can ask what would agents decide in particular cases. It's not surprising that agents will reach different decisions on such questions as, for instance, existential risk mitigation, if they have different preferences.

Let's do a very simplified model, where there are two agents in the world, and that one of them is approached at random to see if they would pay $Y to add a third agent. Each agent derives a (non-indexical) utility of $1 for the presence of this third agent, and nothing else happens in the world to increase or decrease anyone's utility.

First, let's assume that each agent is selfish about their indexical utility (their cash in the hand). If the decision is to not add a third agent, all will get $0 utility. If the decision is to add a third agent, then there are three agents in the world, and one them will be approached to lose $Y. Hence the expected utility is $(1-Y/3).

Now let us assume the agents are altruistic towards each other's indexical utilities. Then the expected utility of not adding a third agent is still $0. If the decision is to add a third agent, then there are three agents in the world, and one of them will be approached to lose $Y - but all will value that lose at the same amount. Hence the expected utility is $(1-Y).

So if $Y=$2, for instance, the "selfish" agents will add the third agent, and the "altruistic" ones will not. So generalising this to more complicated models describing existential risk mitigations schemes, we would expect SB-type agents to behave differently to PP-types in most models. There is no sense in asking which one is "right" and which one gives the more accurate "probability of doom"; instead ask yourself which better corresponds to your own utility model, hence what your decision will be.

Psy-Kosh's non-anthropic problem

Cousin_it has a rephrasing of Psy-Kosh's non-anthropic problem to which updateless anthropics can be illustratively applied:

You are one of a group of 10 people who care about saving African kids. You will all be put in separate rooms, then I will flip a coin. If the coin comes up heads, a random one of you will be designated as the "decider". If it comes up tails, nine of you will be designated as "deciders". Next, I will tell everyone their status, without telling the status of others. Each decider will be asked to say "yea" or "nay". If the coin came up tails and all nine deciders say "yea", I donate $1000 to VillageReach. If the coin came up heads and the sole decider says "yea", I donate only $100. If all deciders say "nay", I donate $700 regardless of the result of the coin toss. If the deciders disagree, I don't donate anything.

We'll set aside the "deciders disagree" and assume that you will all reach the same decision. The point of the problem was to illustrate a supposed preference inversion: if you coordinate ahead of time, you should all agree to say "nay", but after you have been told you're a decider, you should update in the direction of the coin coming up tails, and say "yea".

From the updateless perspective, however, there is no mystery here: the strategy "if I were a decider, I would say nay" maximises utility both for the deciders and the non-deciders.

But what if the problem were rephrased in a more selfish way, with the non-deciders not getting any utility from the setup (maybe they don't get to see the photos of the grateful saved African kids), while the deciders got the same utility as before? Then the strategy "if I were a decider, I would say yea" maximises your expect utility, because non-deciders get nothing, thus reducing the expected utility gains and losses in the world where the coin came out tails. This is similar to SIA odds, again.

That second model is similar to the way I argued for SIA with agents getting created and destroyed. That post has been superseded by this one, which pointed out the flaw in the argument which was (roughly speaking) not considering setups like Psy-Kosh's original model. So once again, whether utility is broadly shared or not affects the outcome of the decision.

The Anthropic Trilemma

Eliezer's anthropic trilemma was an interesting puzzle involving probabilities, copying, and subjective anticipation. It inspired me to come up with a way of spreading utility across multiple copies which was essentially a Sleeping Beauty copy-altruistic model. The decision process going with it is then the same as the updateless decision process outlined here. Though initially it was phrased in terms of SIA probabilities and individual impact, the isomorphism between the two can be seen here.

Revisiting the Anthropic Trilemma II: axioms and assumptions

4 Stuart_Armstrong 16 February 2011 09:42AM

tl;dr: I present four axioms for anthropic reasoning under copying/deleting/merging, and show that these result in a unique way of doing it: averaging non-indexical utility across copies, adding indexical utility, and having all copies being mutually altruistic.

Some time ago, Eliezer constructed an anthropic trilemma, where standard theories of anthropic reasoning seemed to come into conflict with subjective anticipation. rwallace subsequently argued that subjective anticipation was not ontologically fundamental, so we should not expect it to work out of the narrow confines of everyday experience, and Wei illustrated some of the difficulties inherent in "copy-delete-merge" types of reasoning.

Wei also made the point that UDT shifts the difficulty in anthropic reasoning away from probability and onto the utility function, and ata argued that neither the probabilities nor the utility function are fundamental, that it was the decisions that resulted from them that were important - after all, if two theories give the same behaviour in all cases, what grounds do we have for distinguishing them? I then noted that this argument could be extended to subjective anticipation: instead of talking about feelings of subjective anticipation, we could replace it by questions such as "would I give up a chocolate bar now for one of my copies to have two in these circumstances?"

I then made a post where I applied by current intuitions to the anthropic trilemma, and showed how this results in complete nonsense, despite the fact that I used a bona fide utility function. What we need are some sensible criteria for which to divide utility and probability between copies, and this post is an attempt to figure that out. The approach is similar to expected utility, where a quadruped of natural axioms forced all decision processes to have a single format.

The assumptions are:

  1. No intrinsic value in the number of copies
  2. No preference reversals
  3. All copies make the same personal indexical decisions
  4. No special status to any copy.

continue reading »

The Anthropic Trilemma

24 Eliezer_Yudkowsky 27 September 2009 01:47AM

Speaking of problems I don't know how to solve, here's one that's been gnawing at me for years.

The operation of splitting a subjective worldline seems obvious enough - the skeptical initiate can consider the Ebborians, creatures whose brains come in flat sheets and who can symmetrically divide down their thickness.  The more sophisticated need merely consider a sentient computer program: stop, copy, paste, start, and what was one person has now continued on in two places.  If one of your future selves will see red, and one of your future selves will see green, then (it seems) you should anticipate seeing red or green when you wake up with 50% probability.  That is, it's a known fact that different versions of you will see red, or alternatively green, and you should weight the two anticipated possibilities equally.  (Consider what happens when you're flipping a quantum coin: half your measure will continue into either branch, and subjective probability will follow quantum measure for unknown reasons.)

But if I make two copies of the same computer program, is there twice as much experience, or only the same experience?  Does someone who runs redundantly on three processors, get three times as much weight as someone who runs on one processor?

Let's suppose that three copies get three times as much experience.  (If not, then, in a Big universe, large enough that at least one copy of anything exists somewhere, you run into the Boltzmann Brain problem.)

Just as computer programs or brains can split, they ought to be able to merge.  If we imagine a version of the Ebborian species that computes digitally, so that the brains remain synchronized so long as they go on getting the same sensory inputs, then we ought to be able to put two brains back together along the thickness, after dividing them.  In the case of computer programs, we should be able to perform an operation where we compare each two bits in the program, and if they are the same, copy them, and if they are different, delete the whole program.  (This seems to establish an equal causal dependency of the final program on the two original programs that went into it.  E.g., if you test the causal dependency via counterfactuals, then disturbing any bit of the two originals, results in the final program being completely different (namely deleted).)

So here's a simple algorithm for winning the lottery:

continue reading »

Timeless Decision Theory: Problems I Can't Solve

39 Eliezer_Yudkowsky 20 July 2009 12:02AM

Suppose you're out in the desert, running out of water, and soon to die - when someone in a motor vehicle drives up next to you.  Furthermore, the driver of the motor vehicle is a perfectly selfish ideal game-theoretic agent, and even further, so are you; and what's more, the driver is Paul Ekman, who's really, really good at reading facial microexpressions.  The driver says, "Well, I'll convey you to town if it's in my interest to do so - so will you give me $100 from an ATM when we reach town?"

Now of course you wish you could answer "Yes", but as an ideal game theorist yourself, you realize that, once you actually reach town, you'll have no further motive to pay off the driver.  "Yes," you say.  "You're lying," says the driver, and drives off leaving you to die.

If only you weren't so rational!

This is the dilemma of Parfit's Hitchhiker, and the above is the standard resolution according to mainstream philosophy's causal decision theory, which also two-boxes on Newcomb's Problem and defects in the Prisoner's Dilemma.  Of course, any self-modifying agent who expects to face such problems - in general, or in particular - will soon self-modify into an agent that doesn't regret its "rationality" so much.  So from the perspective of a self-modifying-AI-theorist, classical causal decision theory is a wash.  And indeed I've worked out a theory, tentatively labeled "timeless decision theory", which covers these three Newcomblike problems and delivers a first-order answer that is already reflectively consistent, without need to explicitly consider such notions as "precommitment".  Unfortunately this "timeless decision theory" would require a long sequence to write up, and it's not my current highest writing priority unless someone offers to let me do a PhD thesis on it.

However, there are some other timeless decision problems for which I do not possess a general theory.

For example, there's a problem introduced to me by Gary Drescher's marvelous Good and Real (OOPS: The below formulation was independently invented by Vladimir Nesov; Drescher's book actually contains a related dilemma in which box B is transparent, and only contains $1M if Omega predicts you will one-box whether B appears full or empty, and Omega has a 1% error rate) which runs as follows:

Suppose Omega (the same superagent from Newcomb's Problem, who is known to be honest about how it poses these sorts of dilemmas) comes to you and says:

"I just flipped a fair coin.  I decided, before I flipped the coin, that if it came up heads, I would ask you for $1000.  And if it came up tails, I would give you $1,000,000 if and only if I predicted that you would give me $1000 if the coin had come up heads.  The coin came up heads - can I have $1000?"

continue reading »