## An attempt to dissolve subjective expectation and personal identity

*I attempt to figure out a way to dissolve the concepts of 'personal identity' and 'subjective expectation' down to the level of cognitive algorithms, in a way that would let one bite the bullets of the anthropic trilemma. I proceed by considering four clues which seem important: 1) the evolutionary function of personal identity, 2) a sense of personal identity being really sticky, 3) an undefined personal identity causing undefined behavior in our decision-making machinery, and 4) our decision-making machinery being more strongly grounded in our subjective expectation than in abstract models. Taken together, these seem to suggest a solution.*

I ended up re-reading some of the debates about the anthropic trilemma, and it struck me odd that, aside for a few references to personal identity being an evolutionary adaptation, there seemed to be no attempt to reduce the concept to the level of cognitive algorithms. Several commenters thought that there wasn't really any problem, and Eliezer asked them to explain why the claim of there not being any problem regardless violated the intuitive rules of subjective expectation. That seemed like a very strong indication that the question needs to be dissolved, but almost none of the attempted answers seemed to do that, instead trying to solve the question via decision theory without ever addressing the core issue of subjective expectation. rwallace's I-less Eye argued - I believe correctly - that subjective anticipation isn't ontologically fundamental, but still didn't address the question of why it feels like it is.

Here's a sketch of a dissolvement. It seems relatively convincing to me, but I'm not sure how others will take it, so let's give it a shot. Even if others find it incomplete, it should at least help provide clues that point towards a better dissolvement.

**Clue 1: The evolutionary function of personal identity.**

Let's first consider the *evolutionary* function. Why have we *evolved* a sense of personal identity?

The first answer that always comes to everyone's mind is that our brains have evolved for the task of spreading our genes, which involves surviving at least for as long as it takes to reproduce. Simpler neural functions, like maintaining a pulse and having reflexes, obviously do fine without a concept of personal identity. But if we wish to use abstract, explicit reasoning to advance our own interests, we need some definition for exactly *whose* interests it is that our reasoning process is supposed to be optimizing. So evolution comes up with a fuzzy sense of personal identity, so that optimizing the interests of this identity also happens to optimize the interests of the organism in question.

That's simple enough, and this point was already made in the discussions so far. But that doesn't feel like it would resolve our confusion yet, so we need to look at the way that personal identity is actually implemented in our brains. What is the *cognitive *function of personal identity?

**Clue 2: A sense of personal identity is really sticky.**

Even people who disbelieve in personal identity don't really seem to disalieve it: for the most part, they're just as likely to be nervous about their future as anyone else. Even advanced meditators who go out trying to dissolve their personal identity seem to still retain some form of it. PyryP claims that at one point, he reached a stage in meditation where the experience of “somebody who experiences things” shattered and he could turn it entirely off, or attach it to something entirely different, such as a nearby flower vase. But then the experience of having a self began to come back: it was as if the brain was hardwired to maintain one, and to reconstruct it whenever it was broken. I asked him to comment on that for this post, and he provided the following:

## Fundamentals of kicking anthropic butt

**Introduction**

An anthropic problem is one where the very fact of your existence tells you something. "I woke up this morning, therefore the earth did not get eaten by Galactus while I slumbered." Applying your existence to certainties like that is simple - if an event would have stopped you from existing, your existence tells you that that it hasn't happened. If something would only kill you 99% of the time, though, you have to use probability instead of deductive logic. Usually, it's pretty clear what to do. You simply apply Bayes' rule: the probability of the world getting eaten by Galactus last night is equal to the prior probability of Galactus-consumption, times the probability of me waking up given that the world got eaten by Galactus, divided by the probability that I wake up at all. More exotic situations also show up under the umbrella of "anthropics," such as getting duplicated or forgetting which person you are. Even if you've been duplicated, you can still assign probabilities. If there are a hundred copies of you in a hundred-room hotel and you don't know which one you are, don't bet too much that you're in room number 68.

But this last sort of problem is harder, since it's not just a straightforward application of Bayes' rule. You have to determine the probability just from the information in the problem. Thinking in terms of information and symmetries is a useful problem-solving tool for getting probabilities in anthropic problems, which are simple enough to use it and confusing enough to need it. So first we'll cover what I mean by thinking in terms of information, and then we'll use this to solve a confusing-type anthropic problem.

## Colonization models: a tutorial on computational Bayesian inference (part 2/2)

**Recap**

Part 1 was a tutorial for programming a simulation for the emergence and development of intelligent species in a universe 'similar to ours.' In part 2, we will use the model developed in part 1 to evaluate different explanations of the *Fermi paradox*. However, keep in mind that the purpose of this two-part series is for showcasing useful methods, not for obtaining serious answers.

We summarize the model given in part 1:

SIMPLE MODEL FOR THE UNIVERSE

- The universe is represented by the set of all points in Cartesian 4-space which are of Euclidean distance 1 from the origin (that is, the 3-sphere). The distance between two points is taken to be the Euclidean distance (an approximation to the spherical distance which is accurate at small scales)
- The lifespan of the universe consists of 1000 time steps.
- A photon travels
**s=0.0004**units in a time step. - At the end of each time step, there is a chance that a Type 0 civilization will spontaneously emerge in an uninhabited region of space. The base rate for civilization birth is controlled by the parameter
**a**. But this base rate is multiplied by the proportion of the universe which remains uncolonized by Type III civilizations. - In each time step, a Type 0 civilization has a probability
**b**of self-destructing, a probability**c**of transitioning to a non-expansionist Type IIa civilization, and a probability**d**of transitioning to a Type IIb civilization. - Observers can detect all Type II and Type III civilizations within their past light cones.
- In each time step, a Type IIb civilization has a probability
**e**of transitioning to an expansionist Type III civilization. - In each time step, all Type III civilizations colonize space in all directions, expanding their sphere of colonization by
**k * s**units per time step.

**Section III. Inferential Methodology**

In this section, no apologies are made for assuming that the reader has a solid grasp of the principles of Bayesian reasoning. Those currently following the tutorial from Part 1 may find it a good idea to skip to Section IV first.

To dodge the philosophical controversies surrounding anthropic reasoning, we will employ an *impartial observer model.* Like Jaynes, we introduce a robot which is capable of Bayesian reasoning, but here we imagine a model in which such a robot is instantaneously created and *randomly injected* into the universe at a random point in space, and at a random time point chosen uniformly from 1 to 1000 (and the robot is aware that it is created via this mechanism). We limit ourselves to asking what kind of inferences this robot would make in a given situation. Interestingly, the inferences made by this robot will turn out to be quite similar to the inferences that would be made under the self-indication assumption.

## Colonization models: a programming tutorial (Part 1/2)

**Introduction**

Are we alone in the universe? How likely is our species to survive the transition from a Type 0 to a Type II civilization? The answers to these questions would be of immense interest to our race; however, we have few tools to reason about these questions. This does not stop us from wanting to find answers to these questions, often by employing controversial principles of inference such as 'anthropic reasoning.' The reader can find a wealth of stimulating discussion about anthropic reasoning at Katja Grace's blog, the site from which this post takes its inspiration. The purpose of this post is to give a quantitatively oriented approach to anthropic reasoning, demonstrating how computer simulations and Bayesian inference can be used as tools for exploration.

The central mystery we want to examine is the *Fermi paradox*: the fact that

- we are an intelligent civilization
- we cannot observe any signs that other intelligent civilizations ever existed in the universe

One explanation for the Fermi paradox is that we are the only intelligent civilization in the universe. A far more chilling explanation is that intelligent civilizations emerge quite frequently, but that all other intelligent civilizations that have come before us ended up destroying themselves before they could manage to make their mark on their universe.

We can reason about which of the above two explanations are more likely if we have the audacity to assume *a model* for the emergence and development of civilizations in universe 'similar to ours.' In such a model, it is usually useful to distinguish different 'types' of civilizations. Type 0 civilizations are civilizations with similar levels of technology as ourselves. If a Type 0 civilization survives long enough and accumulates enough scientific knowledge, it can make a transition to a Type I civilization--a civilization which has attained mastery of their home planet. A Type I civilization, over time, can transition to a Type II civilization if it colonizes its solar system. We would suppose that a nearby civilization would have to have reached Type II in order for their activities to be prominent enough for us to be able to detect them. In the original terminology, a Type III civilization is one which has mastery of its galaxy, but in this post we take it to mean something else.

The simplest model for the emergence and development of civilizations would have to specify the following:

- the rate at which intelligent life appears in universes similar to ours;
- the rate at which these intelligent species transition from Type 0 to Type II, Type III civilizations--or self-destruct in the process;
- the visibility of Type II and Type III civilizations to Type 0 civilizations elsewhere
- the proportion of advanced civilizations which ultimately adopt expansionist policies;
- the speed at which those Type III civilizations can expand and colonize the universe.

In the model we propose in the post, the above parameters are held to be constant throughout the entire history of the universe. The importance of the model is that after given a particular specification of the parameters, we can apply Bayesian inference to see how well the model explains the Fermi paradox. The idea is to simulate many different histories of universes for a given set of parameters, so as to find the *expected number of observers who observe the Fermi paradox* given a particular specification of the parameters. More details about Bayesian inference given in Part 2 of this tutorial.

This post is targeted at readers who are interested in simulating the emergence and expansion of intelligent civilizations in 'universes similar to ours' but who lack the programming knowledge to code these simulations. In this post we will guide the reader through the design and production of a relatively simple universe model and the methodology for doing 'anthropic' Bayesian inference using the model.

## The Absolute Self-Selection Assumption

There are many confused discussions of anthropic reasoning, both on LW and in surprisingly mainstream literature. In this article I will discuss UDASSA, a framework for anthropic reasoning due to Wei Dai. This framework has serious shortcomings, but at present it is the only one I know which produces reasonable answers to reasonable questions; at the moment it is the only framework which I would feel comfortable using to make a real decision.

I will discuss 3 problems:

1. In an infinite universe, there are infinitely many copies of you (infinitely many of which are Boltzmann brains). How do you assign a measure to the copies of yourself when the uniform distribution is unavailable? Do you rule out spatially or temporally infinite universes for this reason?

2. Naive anthropics ignore the substrate on which a simulation is running and count how many instances of a simulated experience exist (or how many distinct versions of that experience exist). These beliefs are inconsistent with basic intuitions about conscious experience, so we have to abandon something intuitive.

3. The Born probabilities seem mysterious. They can be explained (as well as any law of physics can be explained) by UDASSA.

**Why Anthropic Reasoning?**

When I am trying to act in my own self-interest, I do not know with certainty the consequences of any particular decision. I compare probability distributions over outcomes: an action may lead to one outcome with probability 1/2, and a different outcome with probability 1/2. My brain has preferences between probability distributions built into it.

My brain is not built with the machinery to decide between different universes each of which contains many simulations I care about. My brain can't even really grasp the notion of different copies of me, except by first converting to the language of probability distributions. If I am facing the prospect of being copied, the only way I can grapple with it is by reasoning "I have a 50% chance of remaining me, and a 50% chance of becoming my copy." After thinking in this way, I can hope to intelligently trade-off one copy's preferences against the other's using the same machinery which allows me to make decisions with uncertain outcomes.

In order to perform this reasoning in general, I need a better framework for anthropic reasoning. What I want is a probability distribution over all possible experiences (or "observer-moments"), so that I can use my existing preferences to make intelligent decisions in a universe with more than one observer I care about.

I am going to leave many questions unresolved. I don't understand continuity of experience or identity, so I am simply not going to try to be selfish (I don't know how). I don't understand what constitutes conscious experience, so I am not going to try and explain it. I have to rely on a complexity prior, which involves an unacceptable arbitrary choice of a notion of complexity.

**The Absolute Self-Selection Assumption**

A thinker using Solomonoff induction searches for the simplest explanation for its own experiences. It eventually learns that the simplest explanation for its experiences is the description of an external lawful universe in which its sense organs are embedded and a description of that embedding.

As humans using Solomonoff induction, we go on to argue that this external lawful universe is real, and that our conscious experience is a consequence of the existence of certain substructure in that universe. The absolute self-selection assumption discards this additional step. Rather than supposing that the probability of a certain universe depends on the complexity of that universe, it takes as a primitive object a probability distribution over possible experiences.

By the same reasoning that led a normal Solomonoff inductor to accept the existence of an external universe as the best explanation for its experiences, the least complex description of your conscious experience is the description of an external lawful universe and directions for finding the substructure embodying your experience within that substructure.

This requires specifying a notion of complexity. I will choose a universal computable distribution over strings for now, to mimic conventional Solomonoff induction as closely as possible (and because I know nothing better). The resulting theory is called UDASSA, for Universal Distribution + ASSA.

**Recovering Intuitive Anthropics**

Suppose I create a perfect copy of myself. Intuitively, I would like to weight the two copies equally. Similarly, my anthropic notion of "probability of an experience" should match up with my intuitive notion of probability. Fortunately, UDASSA recovers intuitive anthropics in intuitive situations.

The shortest description of me is a pair (U, x), where U is a description of my universe and x is a description of where to find me in that universe. If there are two copies of me in the universe, then the experience of each can be described in the same way: (U, x1) and (U, x2) are descriptions of approximately equal complexity, so I weight the experience of each copy equally. The total experience of my copies is weighted twice as much as the total experience of an uncopied individual.

Part of x is a description of how to navigate the randomness of the universe. For example, if the last (truly random) coin I saw flipped came up heads, then in order to specify my experiences you need to specify the result of that coin flip. An equal number of equally complex descriptions point to the version of me who saw heads and the version of me who saw tails.

**Problem #1: Infinite Cosmologies**

Modern physics is consistent with infinite universes. An infinite universe contains infinitely many observers (infinitely many of which share all of your experiences so far), and it is no longer sensible to talk about the "uniform distribution" over all of them. You could imagine taking a limit over larger and larger volumes, but there is no particular reason to suspect such a limit would converge in a meaningful sense. One solution that has been suggested is to choose an arbitrary but very large volume of spacetime, and to use a uniform distribution over observers within it. Another solution is to conclude that infinite universes can't exist. Both of these explanations are unsatisfactory.

UDASSA provides a different solution. The probability of an experience depends exponentially on the complexity of specifying it. Just existing in an infinite universe with a short description does not guarantee that you yourself have a short description; you need to specify a position within that infinite universe. For example, if your experiences occur 34908172349823478132239471230912349726323948123123991230 steps after some naturally specified time 0, then the (somewhat lengthy) description of that time is necessary to describe your experiences. Thus the total measure of all observer-moments within a universe is finite.

**Problem #2: Splitting Simulations**

Consider a computer which is 2 atoms thick running a simulation of you. Suppose this computer can be divided down the middle into two 1 atom thick computers which would both run the same simulation independently. We are faced with an unfortunate dichotomy: either the 2 atom thick simulation has the same weight as two 1 atom thick simulations put together, or it doesn't.

In the first case, we have to accept that some computer simulations count for more, even if they are running the same simulation (or we have to de-duplicate the set of all experiences, which leads to serious problems with Boltzmann machines). In this case, we are faced with the problem of comparing different substrates, and it seems impossible not to make arbitrary choices.

In the second case, we have to accept that the operation of dividing the 2 atom thick computer has moral value, which is even worse. Where exactly does the transition occur? What if each layer of the 2 atom thick computer can run independently before splitting? Is physical contact really significant? What about computers that aren't physically coherent? What two 1 atom thick computers periodically synchronize themselves and self-destruct if they aren't synchronized: does this synchronization effectively destroy one of the copies? I know of no way to accept this possibility without extremely counter-intuitive consequences.

UDASSA implies that simulations on the 2 atom thick computer count for twice as much as simulations on the 1 atom thick computer, because they are easier to specify. Given a description of one of the 1 atom thick computers, then there are two descriptions of equal complexity that point to the simulation running on the 2 atom thick computer: one description pointing to each layer of the 2 atom thick computer. When a 2 atom thick computer splits, the total number of descriptions pointing to the experience it is simulating doesn't change.

**Problem #3: The Born Probabilities**

A quantum mechanical state can be described as a linear combination of "classical" configurations. For some reason we appear to experience ourselves as being in one of these classical configurations with probability proportional the coefficient of that configuration *squared*. These probabilities are called the Born probabilities, and are sometimes described either as a serious problem for MWI or as an unresolved mystery of the universe.

What happens if we apply UDASSA to a quantum universe? For one, the existence of an observer within the universe doesn't say anything about conscious experience. We need to specify an algorithm for extracting a description of that observer from a description of the universe.

Consider the randomized algorithm A: compute the state of the universe at time t, then sample a classical configuration with probability proportional to its squared inner product with the universal wavefunction.

Consider the randomized algorithm B: compute the state of the universe at time t, then sample a classical configuration with probability proportional to its inner product with the universal wavefunction.

Using either A or B, we can describe a single experience by specifying a random seed, and picking out that experience within the classical configuration output by A or B using that random seed. If this is the shortest explanation of an experience, the probability of an experience is proportional to the number of random seeds which produce classical configurations containing it.

The universe as we know it is typical for an output of A but completely improbable as an output of B. For example, the observed behavior of stars is consistent with almost all observations weighted according to algorithm A, but with almost no observations weighted according to algorithm B. Algorithm A constitutes an immensely better description of our experiences, in the same sense that quantum mechanics constitutes an immensely better description of our experiences than classical physics.

You could also imagine an algorithm C, which uses the same selection as algorithm B to point to the Everett branch containing a physicist about to do an experiment, but then uses algorithm A to describe the experiences of the physicist after doing that experiment. This is a horribly complex way to specify an experience, however, for exactly the same reason that a Solomonoff inductor places very low probability on the laws of physics suddenly changing for just this one experiment.

Of course this leaves open the question of "why the Born probabilities and not some other rule?" Algorithm B is a valid way of specifying observers, though they would look exactly as foreign as observes with different rules of physics (Wei Dai has suggested that the structures specified by algorithm B are not even self-aware as justification for the Born rule). The fact that we are described by algorithm A rather than B is no more or less mysterious than the fact that the laws of physics are like *so* instead of some other way.

In the same way that we can retroactively justify our laws of physics by appealing to their elegance and simplicity (in a sense we don't yet really understand) I suspect that we can justify selection according to algorithm A rather than algorithm B. In an infinite universe, algorithm B doesn't even work (because the sum of the inner products of the universal wavefunction with the classical configurations is infinite) and even in a finite universe algorithm B necessarily involves the additional step of normalizing the probability distribution or else producing nonsense. Moreover, algorithm A is a nicer mathematical object than algorithm B when the evolution of the wavefunction is unitary, and so the same considerations that suggest elegant laws of physics suggest algorithm A over B (or some other alternative).

Note that this is *not* the core of my explanation of the Born probabilities; in UDASSA, choosing a selection procedure is just as important as describing the universe, and so some explicit sort of observer selection is a necessary part of the laws of physics. We predict the Born rule to hold in the future because it has held in the past, just like we expect the laws of physics to hold in the future because they have held in the past.

In summary, if you use Solomonoff induction to predict what you will see next based on everything you have seen so far, your predictions about the future will be consistent with the Born probabilities. You only get in trouble when you use Solomonoff induction to predict what the universe contains, and then get bogged down in the question "Given that the universe contains all of these observers, which one should I expect to be *me*?"

## Updateless anthropics

Three weeks ago, I set out to find a new theory of anthropics, to try and set decision theory on a firm footing with respect to copying, deleting copies, merging them, correlated decisions, and the presence or absence of extra observers. I've since come full circle, and realised that UDT already has a built-in anthropic theory, that resolves a lot of the problems that had been confusing me.

The theory is simple, and is essentially a rephrasing of UDT: if you are facing a decision X, and trying to figure out the utility of X=a for some action a, then calculate the full expected utility of X being a, given the objective probabilities of each world (including those in which you don't exist).

As usual, you have to consider the consequences of X=a for all agents who will make the same decision as you, whether they be exact copies, enemies, simulations or similar-minded people. However, your utility will have to do more work that is usually realised: notions such as selfishness or altruism with respect to your copies have to be encoded in the utility function, and will result in substantially different behaviour.

The rest of the post is a series of cases-studies illustrating this theory. Utility is assumed to be linear in cash for convenience.

**Sleeping with the Presumptuous Philosopher**

The first test case is the Sleeping Beauty problem.

In its simplest form, this involves a coin toss; if it comes out heads, one copy of Sleeping Beauty is created. If it comes out tails, two copies are created. Then the copies are asked at what odds they would be prepared to bet that the coin came out tails. You can assume either that the different copies care for each other in the manner I detailed here, or more simply that *all* winnings will be kept by a future merged copy (or an approved charity). Then the algorithm is simple: the two worlds have equal probability. Let X be the decision where sleeping beauty decides between a contract that pays out $1 if the coin is heads, versus one that pays out $1 if the coin is tails. If X="heads" (to use an obvious shorthand), then Sleeping Beauty will expect to make $1*0.5, as she is offered the contract once. If X="tails", then the total return of that decision is $1*2*0.5, as copies of her will be offered the contract twice, and they will all make the same decision. So Sleeping Beauty will follow the SIA 2:1 betting odds of tails over heads.

Variants such as "extreme Sleeping Beauty" (where thousands of copies are created on tails) will behave in the same way; if it feels counter-intuitive to bet at thousands-to-one odds that a fair coin landed tails, it's the fault of expected utility itself, as the rewards of being right dwarf the costs of being wrong.

But now let's turn to the Presumptuous Philosopher, a thought experiment that is often confused with Sleeping Beauty. Here we have exactly the same setup as "extreme Sleeping Beauty", but the agents (the Presumptuous philosophers) are mutually selfish. Here the return to X="heads" remains $1*0.5. However the return to X="tails" is also $1*0.5, since even if all the Presumptuous Philosophers in the "tails" universe bet on "tails", each one will still only get $1 in utility. So the Presumptuous Philosopher should only take even SSA betting 1:1 odds on the result of the coin flip.

So SB is acts like she follows the self-indication assumption, (SIA), and while the PP is following the self-sampling assumption (SSA). This remains true if we change the setup so that one agent is given a betting opportunity in the tails universe. Then the objective probability of any one agent being asked is low, so both SB and PP model the "objective probability" of the tails world, given that they have been asked to bet, as being low. However, SB gains utility if any of her copies is asked to bet and receives a profit, so the strategy "if I'm offered $1 if I guess correctly whether the coin is heads or tails, I will say tails" gets her $1*0.5 utility whether or not she is the specific one who is asked. Betting heads nets her the same result, so SB will give SIA 1:1 odds in this case.

On the other hand, the PP will only gain utility in the very specific world where he himself is asked to bet. So his gain from the updateless "if I'm offered $1 if I guess correctly whether the coin is heads or tails, I will say tails" is tiny, as he's unlikely to be asked to bet. Hence he will offer the SSA odds that make heads a much more "likely" proposition.

**The Doomsday argument**

Now, using SSA odds brings us back into the realm of the classical Doomsday argument. How is it that Sleeping Beauty is immune to the Doomsday argument while the Presumptuous Philosopher is not? Which one is right; is the world really about to end?

Asking about probabilities independently of decisions is meaningless here; instead, we can ask what would agents decide in particular cases. It's not surprising that agents will reach different decisions on such questions as, for instance, existential risk mitigation, if they have different preferences.

Let's do a very simplified model, where there are two agents in the world, and that one of them is approached at random to see if they would pay $Y to add a third agent. Each agent derives a (non-indexical) utility of $1 for the presence of this third agent, and nothing else happens in the world to increase or decrease anyone's utility.

First, let's assume that each agent is selfish about their indexical utility (their cash in the hand). If the decision is to not add a third agent, all will get $0 utility. If the decision is to add a third agent, then there are three agents in the world, and one them will be approached to lose $Y. Hence the expected utility is $(1-Y/3).

Now let us assume the agents are altruistic towards each other's indexical utilities. Then the expected utility of not adding a third agent is still $0. If the decision is to add a third agent, then there are three agents in the world, and one of them will be approached to lose $Y - but all will value that lose at the same amount. Hence the expected utility is $(1-Y).

So if $Y=$2, for instance, the "selfish" agents will add the third agent, and the "altruistic" ones will not. So generalising this to more complicated models describing existential risk mitigations schemes, we would expect SB-type agents to behave differently to PP-types in most models. There is no sense in asking which one is "right" and which one gives the more accurate "probability of doom"; instead ask yourself which better corresponds to your own utility model, hence what your decision will be.

**Psy-Kosh's non-anthropic problem**

Cousin_it has a rephrasing of Psy-Kosh's non-anthropic problem to which updateless anthropics can be illustratively applied:

You are one of a group of 10 people who care about saving African kids. You will all be put in separate rooms, then I will flip a coin. If the coin comes up heads, a random one of you will be designated as the "decider". If it comes up tails, *nine* of you will be designated as "deciders". Next, I will tell everyone their status, without telling the status of others. Each decider will be asked to say "yea" or "nay". If the coin came up tails and all nine deciders say "yea", I donate $1000 to VillageReach. If the coin came up heads and the sole decider says "yea", I donate only $100. If all deciders say "nay", I donate $700 regardless of the result of the coin toss. If the deciders disagree, I don't donate anything.

We'll set aside the "deciders disagree" and assume that you will all reach the same decision. The point of the problem was to illustrate a supposed preference inversion: if you coordinate ahead of time, you should all agree to say "nay", but after you have been told you're a decider, you should update in the direction of the coin coming up tails, and say "yea".

From the updateless perspective, however, there is no mystery here: the strategy "if I were a decider, I would say nay" maximises utility both for the deciders and the non-deciders.

But what if the problem were rephrased in a more selfish way, with the non-deciders not getting any utility from the setup (maybe they don't get to see the photos of the grateful saved African kids), while the deciders got the same utility as before? Then the strategy "if I were a decider, I would say yea" maximises your expect utility, because non-deciders get nothing, thus reducing the expected utility gains and losses in the world where the coin came out tails. This is similar to SIA odds, again.

That second model is similar to the way I argued for SIA with agents getting created and destroyed. That post has been superseded by this one, which pointed out the flaw in the argument which was (roughly speaking) not considering setups like Psy-Kosh's original model. So once again, whether utility is broadly shared or not affects the outcome of the decision.

**The Anthropic Trilemma**

Eliezer's anthropic trilemma was an interesting puzzle involving probabilities, copying, and subjective anticipation. It inspired me to come up with a way of spreading utility across multiple copies which was essentially a Sleeping Beauty copy-altruistic model. The decision process going with it is then the same as the updateless decision process outlined here. Though initially it was phrased in terms of SIA probabilities and individual impact, the isomorphism between the two can be seen here.

## Dead men tell tales: falling out of love with SIA

SIA is the Self Indication Assumption, an anthropic theory about how we should reason about the universe given that we exist. I used to love it; the argument that I've found most convincing about SIA was the one I presented in this post. Recently, I've been falling out of love with SIA, and moving more towards a UDT version of anthropics (objective probabilities and total impact of your decision being of a specific type, including in all copies of you and enemies with the same decision process). So it's time I revisit my old post, and find the hole.

The argument rested on the plausible sounding assumption that creating extra copies and killing them is no different from if they hadn't existed in the first place. More precisely, it rested on the assumption that if I was told "You are not one of the agents I am about to talk about. Extra copies were created to be destroyed," it was exactly the same as hearing "Extra copies were created to be destroyed. And you're not one of them."

But I realised that from the UDT/TDT perspective, there is a great difference between the two situations, if I have the time to update decisions in the course of the sentence. Consider the following three scenarios:

- Scenario 1 (SIA):

Two agents are created, then one is destroyed with 50% probability. Each living agent is entirely selfish, with utility linear in money, *and the dead agent gets nothing*. Every survivor will be presented with the same bet. Then you should take the SIA 2:1 odds that you are in the world with two agents. This is the scenario I was assuming.

- Scenario 2 (SSA):

Two agents are created, then one is destroyed with 50% probability. Each living agent is entirely selfish, with utility linear in money, *and the dead agent is altruistic towards his survivor*. This is similar to my initial intuition in this post. Note that every agents have the same utility: "as long as I live, I care about myself, but after I die, I'll care about the other guy", so you can't distinguish them based on their utility. As before, every survivor will be presented with the same bet.

Here, once you have been told the scenario, but before knowing whether anyone has been killed, you should pre-commit to taking 1:1 odds that you are in the world with two agents. And in UDT/TDT precommitting is the same as making the decision.

## Revisiting the Anthropic Trilemma II: axioms and assumptions

tl;dr: I present four axioms for anthropic reasoning under copying/deleting/merging, and show that these result in a unique way of doing it: averaging non-indexical utility across copies, adding indexical utility, and having all copies being mutually altruistic.

Some time ago, Eliezer constructed an anthropic trilemma, where standard theories of anthropic reasoning seemed to come into conflict with subjective anticipation. rwallace subsequently argued that subjective anticipation was not ontologically fundamental, so we should not expect it to work out of the narrow confines of everyday experience, and Wei illustrated some of the difficulties inherent in "copy-delete-merge" types of reasoning.

Wei also made the point that UDT shifts the difficulty in anthropic reasoning away from probability and onto the utility function, and ata argued that neither the probabilities nor the utility function are fundamental, that it was the decisions that resulted from them that were important - after all, if two theories give the same behaviour in all cases, what grounds do we have for distinguishing them? I then noted that this argument could be extended to subjective anticipation: instead of talking about feelings of subjective anticipation, we could replace it by questions such as "would I give up a chocolate bar now for one of my copies to have two in these circumstances?"

I then made a post where I applied by current intuitions to the anthropic trilemma, and showed how this results in complete nonsense, despite the fact that I used a bona fide utility function. What we need are some sensible criteria for which to divide utility and probability between copies, and this post is an attempt to figure that out. The approach is similar to expected utility, where a quadruped of natural axioms forced all decision processes to have a single format.

The assumptions are:

- No intrinsic value in the number of copies
- No preference reversals
- All copies make the same personal indexical decisions
- No special status to any copy.

## Iterated Sleeping Beauty and Copied Minds

Before I move on to a summation post listing the various raised thought experiments and paradoxes related to mind copying, I would like to cast attention to a particular moment regarding the notion of "subjective probability".

In my earlier discussion post on the subjective experience of a forked person, I compared the scenario where one copy is awakened in the future to the Sleeping Beauty thought experiment. And really, it describes any such process, because there will inevitably be a time gap, however short, between the time of fork and the copy's subjective awakening: no copy mechanism can be instant.

In the traditional Sleeping Beauty scenario, there are two parties: Beauty and the Experimenter. The Experimenter has access to a sleep-inducing drug that also resets Beauty's memory to the state at t=0. Suppose Beauty is put to sleep at t=0, and then a fair coin is tossed. If the coin comes heads, Beauty is woken up at t=1, permanently. If the coin comes tails, Beauty is woken up at t=1, questioned, memory-wiped, and then woken up again at t=2, this time permanently.

In this experiment, intuitively, Beauty's subjective anticipation of the coin coming tails, without access to any information other than the conditions of the experiment, should be 2/3. I won't be arguing here whether this particular answer is right or wrong: the discussion has been raised many times before, and on Less Wrong as well. I'd like to point out one property of the experiment that differentiates it from other probability-related tasks: *erasure of information*, which renders the whole experiment a non-experiment.

In Bayesian theory, the (prior) probability of an outcome is the measure of our anticipation of it to the best of our knowledge. Bayesians think of experiments as a way to get new information, and update their probabilities based on the information gained. However, in the Sleeping Beauty experiment, Beauty gains no new information from waking up at any time, in any outcome. She has the exact same mind-state at any point of awakening that she had at t=0, and is for all intents and purposes the exact same person at any such point. As such, we can ask Beauty, "If we perform the experiment, what is your anticipation of waking up in the branch where the coin landed tails?", and she can give the same answer without actually performing the experiment.

So how does it map to the mind-copying problem? In a very straightforward way.

Let's modify the experiment this way: at t=0, Beauty's state is backed up. Let's suppose that she is then allowed to live her normal life, but the time-slices are large enough that she dies within the course of a single round. (Say, she has a normal human lifespan and the time between successive iterations is 200 years.) However, at t=1, a copy of Beauty is created in the state at which the original was at t=0, a coin is tossed, and if and only if it comes tails, another copy is created at t=2.

If Beauty knows the condition of this experiment, no matter what answer she would give in the classic formulation of the problem, I don't expect it to change here. The two formulations are, as far as I can see, equivalent.

However, in both cases, from the Experimenter's point of view, the branching points are independent events, which allows us to construct scenarios that question the straightforward interpretation of "subjective probability". And for this, I refer to the last experiment in my earlier post.

Imagine you have an indestructible machine that restores one copy of you from backup every 200 years. In this scenario, it seems you should anticipate waking up with equal probability between now and the end of time. But it's inconsistent with the formulation of probability for discrete outcomes: we end up with a diverging series, and as the length of the experiment approaches infinity (ignoring real-world cosmology for the moment), the subjective probability of every individual outcome (finding yourself at t=1, finding yourself at t=2, etc.) approaches 0. The equivalent classic formulation is a setup where the Experimenter is programmed to wake Beauty after every time-slice and unconditionally put her back to sleep.

This is not the only possible "diverging Sleeping Beauty" problem. Suppose that at t=1, Beauty is put back to sleep with probability 1/2 (like in the classic experiment), at t=2 she is put back to sleep with probability 1/3, then 1/4, and so on. In this case, while it seems almost certain that she will eventually wake up permanently (in the same sense that it is "almost certain" that a fair random number generator will eventually output any given value), the expected value is still infinite.

In the case of a converging series of probabilities of remaining asleep - for example, if it's decided by a coin toss at each iteration whether Beauty is put back to sleep, in which case the series is 1/2 + 1/4 + 1/8 + ... = 1 -- Beauty can give a subjective expected value, or the average time at which she expects to be woken up permanently.

In a general case, let E_{i} be the event "the experiment continues at stage i" (that is, Beauty is not permanently awakened at stage i, or in the alternate formulation, more copies are created beyond that point). Then if we extrapolate the notion of "subjective probability" that leads us to the answer 2/3 in the classic formulation, then the definition is meaningful if and only if the series *of objective probabilities* ∑_{i=1..∞} P(E_{i)} converges -- it doesn't have to converge to 1, we'll just need to renormalize the calculations otherwise. Which, given that the randomizing events are independent, simply doesn't have to happen.

Even if we reformulate the experiment in terms of decision theory, it's not clear how it will help us. If the bet is "win 1 utilon if you get your iteration number right", the probability of winning it in a divergent case is 0 at any given iteration. And yet, if all cases are perfectly symmetric information-wise so that you make the same decision over and over again, you'll eventually get the answer right, with exactly one of you winning the bet, even no matter what your "decision function" is - even if it's simply something like "return 42;". Even a stopped clock is right sometimes, in this case once.

It would be tempting, seeing this, to discard the notion of "subjective anticipation" altogether as ill-defined. But that seems to me like tossing out the Born probabilities just because we go from Copenhagen to MWI. If I'm forked, I expect to continue my experience as either the original or the copy with a probability of 1/2 -- *whatever that means*. If I'm asked to participate in the classic Sleeping Beauty experiment, and to observe the once-flipped coin at every point I wake up, I will expect to see tails with a probability of 2/3 -- again, whatever that means.

The situations described here have a very specific set of conditions. We're dealing with complete information erasure, which prevents any kind of Bayesian update and in fact makes the situation completely symmetric from the decision agent's perspective. We're also dealing with an anticipation all the way into infinity, which cannot occur in practice due to the finite lifespan of the universe. And yet, I'm not sure what to do with the apparent need to update my anticipations for times arbitrarily far into the future, for an arbitrarily large number of copies, for outcomes with an arbitrarily high degree of causal removal from my current state, which may fail to occur, before the sequence of events that can lead to them is even put into motion.

## If a tree falls on Sleeping Beauty...

Several months ago, we had an interesting discussion about the Sleeping Beauty problem, which runs as follows:

Sleeping Beauty volunteers to undergo the following experiment. On Sunday she is given a drug that sends her to sleep. A fair coin is then tossed just once in the course of the experiment to determine which experimental procedure is undertaken. If the coin comes up heads, Beauty is awakened and interviewed on Monday, and then the experiment ends. If the coin comes up tails, she is awakened and interviewed on Monday, given a second dose of the sleeping drug, and awakened and interviewed again on Tuesday. The experiment then ends on Tuesday, without flipping the coin again. The sleeping drug induces a mild amnesia, so that she cannot remember any previous awakenings during the course of the experiment (if any). During the experiment, she has no access to anything that would give a clue as to the day of the week. However, she knows all the details of the experiment.

Each interview consists of one question, “What is your credence now for the proposition that our coin landed heads?”

In the end, the fact that there were so many reasonable-sounding arguments for both sides, and so much disagreement about a simple-sounding problem among above-average rationalists, should have set off major alarm bells. Yet only a few people pointed this out; most commenters, including me, followed the silly strategy of trying to *answer* the question, and I did so even after I *noticed* that my intuition could see both answers as being right depending on which way I looked at it, which in retrospect would have been a *perfect* time to say “I notice that I am confused” and backtrack a bit…

And on reflection, considering my confusion rather than trying to consider the question on its own terms, it seems to me that the problem (as it’s normally stated) is *completely* a tree-falling-in-the-forest problem: a debate about the normatively “correct” degree of credence which only seemed like an issue because any conclusions about what Sleeping Beauty “should” believe weren’t paying their rent, were disconnected from any expectation of feedback from reality about how right they were.

View more: Next