## Publication of "Anthropic Decision Theory"

My paper "Anthropic decision theory for self-locating beliefs", based on posts here on Less Wrong, has been published as a Future of Humanity Institute tech report. Abstract:

This paper sets out to resolve how agents ought to act in the Sleeping Beauty problem and various related anthropic (self-locating belief) problems, not through the calculation of anthropic probabilities, but through finding the correct decision to make. It creates an anthropic decision theory (ADT) that decides these problems from a small set of principles. By doing so, it demonstrates that the attitude of agents with regards to each other (selfish or altruistic) changes the decisions they reach, and that it is very important to take this into account. To illustrate ADT, it is then applied to two major anthropic problems and paradoxes, the Presumptuous Philosopher and Doomsday problems, thus resolving some issues about the probability of human extinction.

Most of these ideas are also explained in this video.

To situate Anthropic Decision Theory within the UDT/TDT family: it's basically a piece of UDT applied to anthropic problems, where the UDT approach can be justified by using generally fewer, and more natural, assumptions than UDT does.

## Simplified Anthropic Doomsday

Here is a simplified version of the Doomsday argument in Anthropic decision theory, to get easier intuitions.

Assume a single agent A exists, an average utilitarian, with utility linear in money. Their species survives with 50% probability; denote this event by S. If the species survives, there will be 100 people total; otherwise the average utilitarian is the only one of its kind. An independent coin lands heads with 50% probability; denote this event by H.

Agent A must price a coupon C_{S} that pays out €1 on S, and a coupon C_{H} that pays out €1 on H. The coupon C_{S} pays out only on S, thus the reward only exists in a world where there are a hundred people, thus if S happens, the coupon C_{S} is worth (€1)/100. Hence its expected worth is (€1)/200=(€2)/400.

But H is independent of S, so (H,S) and (H,¬S) both have probability 25%. In (H,S), there are a hundred people, so C_{H} is worth (€1)/100. In (H,¬S), there is one person, so C_{H} is worth (€1)/1=€1. Thus the expected value of C_{H} is (€1)/4+(€1)/400 = (€101)/400. This is more than 50 times the value of C_{S}.

Note that C_{¬S}, the coupon that pays out on doom, has an even higher expected value of (€1)/2=(€200)/400.

So, H and S have identical probability, but A assigns C_{S} and C_{H} different expected utilities, with a higher value to C_{H}, simply because S is correlated with survival and H is independent of it (and A assigns an ever higher value to C_{¬S}, which is anti-correlated with survival). This is a phrasing of the Doomsday Argument in ADT.

## The Doomsday argument in anthropic decision theory

**EDIT**: added a simplified version here.

*Crossposted at the intelligent agents forum.*

In Anthropic Decision Theory (ADT), behaviours that resemble the Self Sampling Assumption (SSA) derive from average utilitarian preferences (and from certain specific selfish preferences).

However, SSA implies the doomsday argument, and, to date, I hadn't found a good way to express the doomsday argument within ADT.

This post will remedy that hole, by showing how there is a natural doomsday-like behaviour for average utilitarian agents within ADT.

## Doomsday argument for Anthropic Decision Theory

tl;dr: there is no real Doomsday argument in ADT. Average utilitarians over-discount the future compared with total utilitarians, but ADT can either increase or decrease this effect. The SIA Doomsaday argument can also be constructed, but this is simply a consequence of total utilitarian preferences, not of increased probability of doom.

I've been having a lot of trouble formulating a proper version of the doomsday argument for Anthropic Decision Theory (ADT). ADT mimics SIA-like decisions (for total utilitarians, those with a population independent utility function, and certain types of selfish agents), and SSA-like decisions (for average utilitarians, and a different type of selfish agent). So all paradoxes of SIA and SSA should be formulatable in it. And that is indeed the case for the presumptuous philosopher and the Adam and Eve paradox. But I haven't found a good formulation of the Doomsday argument.

And I think I know why now. It's because the Doomsday argument-like effects come from the preferences of those average utilitarian agents. Adding anthropic effects does not make the Doomsday argument stronger! It's a non-anthropic effect of those preferences. ADT may allow certain selfish agents to make acausal contracts that make them behave like average utilitarian agents, but it doesn't add any additional effect.

## Doomsday decisions

Since ADT is based on decisions, rather than probabilities, we need to formulate the Doomsday argument in decision form. The most obvious method is a decision that affects the chances of survival of future generations.

But those decisions are dominated by whether the agent desires future generations or not! Future generations of high average happiness are desired, those of lower average happiness are undesirable. This effect dominates the decisions of average utilitarians, making it hard to formulate a decision that addresses 'risk of doom' in isolation. There is one way of doing this, though: looking at how agents discount the future.

## Discounting the future

Consider the following simple model. If humanity survives for n generations, there will have been a total of Gq^{n} humans who ever lived, for some G (obviously q>1). At each generation, there is an independent probability p of extinction, and pq < 1 (so the expected population is finite). At each generation, there is an (independent) choice of consuming a resource to get X utilities, or investing it for the next generation, who will automatically consume it for rX utilities.

Assume we are now at generation n. From the total utilitarian perspective, consuming the resource gives X with certainty, and rX with probability p. So the total utilitarian will delay consumption iff pr>1.

The average utilitarian must divide by total population. Let C be the current expected reciprocal of the population. Current consumption gives an expected XC utilities. By symmetry arguments, we can see that, if humanity survives to the next generation (an event of probability p), the expected reciprocal of population is C/q. If humanity doesn't survive, there is no delayed consumption; so the expected utility of delaying consumption is prXC/q. Therefore the average utilitarian will delay consumption iff pr/q > 1.

So the average utilitarian acts as if they discounted the future by p/q, while the total utilitarian discounts it by p. In a sense, the average utilitarian seems to fear the future more.

But where's the ADT in this? I've derived this result just by considering what an average utilitarian would do for any given n. Ah, but that's because of the particular choice I've made for population growth and risk rate. A proper ADT average utilitarian would compute the different p_{i} and q_{i} for all generation steps and consider the overall value of "consume now" decisions. In general, this could result in discounting that is either higher or lower than the myoptic, one-generation only, average utilitarian. The easy way to see this is to imagine that p is as above (and p is small), as are almost all the q - except for q_{n}. Then the ADT average utilitarian discount rate is still roughly p/q, while the myoptic average utilitarian discount rate at generation n is p/q_{n}, which could be anything.

So the "Doomsday argument" effect - the higher discounting of the future - is an artefact of average utilitarianism, while the anthropic effects of ADT can either increase or decrease this effect.

## SIA Doomsday

LessWronger turchin reminded me of Katja Grace's SIA doomsday argument. To simplify this greatly, it's the argument that since SIA prefers worlds with many people in them (most especially many people "like us"), this increases the probability that there are/were/will be many civilizations at our level of development. Hence the Great Filter - the process that stops the universe from being filled with life - is most likely in the future for our kind of civilizations. Hence the probability of doom is higher.

How does this work, translated into ADT format? Well, imagine there were two options: either the great filter is in the distant evolutionary past, or is in the future. The objective uncertainty is 50-50 on either possibility. If the great filter is in the future, your civilization has a probability p of getting through it (thus there is a total probability of p/2 of your civilization succumbing to a future great filter). You have the option of paying a cost C to avoid the great filter entirely for your civilization. You derive a benefit B from your civilization surviving.

Then you will pay C iff C<Bp/2. But now imagine that you are a total utilitarian, you also care about the costs and benefits from other civilizations, and you consider your decision is linked with theirs via ADT. If the great filter is early, let's assume that your civilization is the only one still in existence. If the great filter is late, then there are Ω civilizations still around.

Therefore if the great filter is early, the total cost is C (your civilization, the only one around, pays C, but gets no benefit as there is no late great filter). However, if the great filter is late, the total cost is ΩC and the total benefit is ΩBp (all of Ω civilizations pay C and get benefit B with probability p). So the expected utility gain is ΩBp-(Ω+1)C. So you will pay the cost iff C < BpΩ/(Ω+1).

To an outsider this looks like you believe the probability of a late great filter is Ω/(Ω+1), rather than 0.5. However, this is simply a consequence of your total utilitarian preferences, and don't reflect an objectively larger chance of death.

## The Interrupted Ultimate Newcomb's Problem

While figuring out my error in my solution to the Ultimate Newcomb's Problem, I ran across this (distinct) reformulation that helped me distinguish between what I was doing and what the problem was actually asking.

... but that being said, I'm not sure if my answer to the reformulation is correct either.

The question, cleaned for Discussion, looks like this:

You approach the boxes and lottery, which are exactly as in the UNP. Before reaching it, you come to sign with a flashing red light. The sign reads: "INDEPENDENT SCENARIO BEGIN."

Omega, who has predicted that you will be confused, shows up to explain: "This is considered an artificially independent experiment. Your algorithm for solving this problem will not be used in my simulations of your algorithm for my various other problems. In other words, you are allowed to two-box here but one-box Newcomb's problem, or vice versa."

This is motivated by the realization that I've been making the same mistake as in the original Newcomb's Problem, though this justification does not (I believe) apply to the original. The mistake is simply this: that I assumed that I simply appear *in medias res*. When solving the UNP, it is (seems to be) important to remember that you may be in some very rare edge case of the main problem, and that you are choosing your algorithm for the problem as a whole.

But if that's *not* true - if you're allowed to appear in the middle of the problem, and no counterfactual-yous are at risk - it sure seems like two-boxing is justified - as khafra put it, "trying to ambiently control basic arithmetic".

(Speaking of which, is there a write up of ambient decision theory anywhere? For that matter, is there any compilation of decision theories?)

EDIT: (Yes to the first, though not under that name: Controlling Constant Programs.)

## A model of UDT with a concrete prior over logical statements

I've been having difficulties with constructing a toy scenario for AI self-modification more interesting than Quirrell's game, because you really want to do expected utility maximization of some sort, but currently our best-specified decision theories search through the theorems of one particular proof system and "break down and cry" if they can't find one that tells them what their utility will be if they choose a particular option. This is fine if the problems are simple enough that we always find the theorems we need, but the AI rewrite problem is precisely about skirting that edge. It seems natural to want to choose some probability distribution over the possibilities that you can't rule out, and then do expected utility maximization (because if you don't maximize EU over some prior, it seems likely that someone could Dutch-book you); indeed, Wei Dai's original UDT has a "mathematical intuition module" black box which this would be an implementation of. But how *do* you assign probabilities to logical statements? What consistency conditions do you ask for? What are the "impossible possible worlds" that make up your probability space?

Recently, Wei Dai suggested that logical uncertainty might help avoid the Löbian problems with AI self-modification, and although I'm sceptical about this idea, the discussion pushed me into trying to confront the logical uncertainty problem head-on; then, reading Haim Gaifman's paper "Reasoning with limited resources and assigning probabilities to logical statements" (which Luke linked from So you want to save the world) made something click. I want to present a simple suggestion for a concrete definition of "impossible possible world", for a prior over them, and for an UDT algorithm based on that. I'm not sure whether the concrete prior is useful—the main point in giving it is to have a concrete example we can try to prove things about—but the definition of logical possible worlds looks like a promising theoretical tool to me.

## Consequentialist Formal Systems

*This post describes a different (less agent-centric) way of looking at UDT-like decision theories that resolves some aspects of the long-standing technical problem of spurious moral arguments. It's only a half-baked idea, so there are currently a lot of loose ends.*

### On spurious arguments

UDT agents are usually considered as having a disinterested inference system (a "mathematical intuition module" in UDT and first order proof search in ADT) that plays a purely epistemic role, and preference-dependent decision rules that look for statements that characterize possible actions in terms of the utility value that the agent optimizes.

The statements (supplied by the inference system) used by agent's decision rules (to pick one of the many variants) have the form **[(A=A1 => U=U1) and U<=U1]**. Here, **A** is a symbol defined to be the actual action chosen by the agent, **U** is a similar symbol defined to be the actual value of world's utility, and **A1** and **U1** are some particular possible action and possible utility value. If the agent finds that this statement is provable, it performs action **A1**, thereby making **A1** the actual action.

The use of this statement introduces the problem of spurious arguments: if **A1** is a bad action, but for some reason it's still chosen, then **[(A=A1 => U=U1) and U<=U1]** is true, since utility value **U** will in that case be in fact **U1**, which justifies (by the decision rule) choosing the bad action **A1**. In usual cases, this problem results in the difficulty of proving that an agent will behave in the expected manner (i.e. won't choose a bad action), which is resolved by adding various compilicated clauses to its decision algorithm. But even worse, it turns out that if an agent is hapless enough to take seriously a (formally correct) proof of such a statement supplied by an enemy (or if its own inference system is malicious), it can be persuaded to take any action at all, irrespective of agent's own preferences.

## An example of self-fulfilling spurious proofs in UDT

Benja Fallenstein was the first to point out that spurious proofs pose a problem for UDT. Vladimir Nesov and orthonormal asked for a formalization of that intuition. In this post I will give an example of a UDT-ish agent that fails due to having a malicious proof searcher, which feeds the agent a spurious but valid proof.

The basic idea is to have an agent A that receives a proof P as input, and checks P for validity. If P is a valid proof that a certain action a is best in the current situation, then A outputs a, otherwise A tries to solve the current situation by its own means. Here's a first naive formalization, where U is the world program that returns a utility value, A is the agent program that returns an action, and P is the proof given to A:

def U():

if A(P)==1:

return 5

else:

return 10

def A(P):

if P is a valid proof that A(P)==a implies U()==u, and A(P)!=a implies U()<=u:

return a

else:

do whatever

This formalization cannot work because a proof P can never be long enough to contain statements about A(P) inside itself. To fix that problem, let's introduce a function Q that generates the proof P:

def U():

if A(Q())==1:

return 5

else:

return 10

def A(P):

if P is a valid proof that A(Q())==a implies U()==u, and A(Q())!=a implies U()<=u:

return a

else:

do whatever

In this case it's possible to write a function Q that returns a proof that makes A return the suboptimal action 1, which leads to utility 5 instead of 10. Here's how:

Let X be the statement "A(Q())==1 implies U()==5, and A(Q())!=1 implies U()<=5". Let Q be the program that enumerates all possible proofs trying to find a proof of X, and returns that proof if found. (The definitions of X and Q are mutually quined.) If X is provable at all, then Q will find that proof, and X will become true (by inspection of U and A). That reasoning is formalizable in our proof system, so the statement "if X is provable, then X" is provable. Therefore, by Löb's theorem, X is provable. So Q will find a proof of X, and A will return 1.

One possible conclusion is that a UDT agent cannot use just any proof searcher or "mathematical intuition module" that's guaranteed to return valid mathematical arguments, because valid mathematical arguments can make the agent choose arbitrary actions. The proof searchers from some previous posts were well-behaved by construction, but not all of them are.

The troubling thing is that you may end up with a badly behaved proof searcher by accident. For example, consider a variation of U that adds some long and complicated computation to the "else" branch of U, before returning 10. That increases the length of the "natural" proof that a=2 is optimal, but the spurious proof for a=1 stays about the same length as it was, because the spurious proof can just ignore the "else" branch of U. This way the spurious proof can become much shorter than the natural proof. So if (for example) your math intuition module made the innocuous design decision of first looking at actions that are likely to have shorter proofs, you may end up with a spurious proof. And as a further plot twist, if we make U return 0 rather than 10 in the long-to-compute branch, you might choose the *correct* action due to a spurious proof instead of the natural one.

## The limited predictor problem

This post requires some knowledge of logic, computability theory, and K-complexity. Much of the credit goes to Wei Dai. The four sections of the post can be read almost independently.

The limited predictor problem (LPP) is a version of Newcomb's Problem where the predictor has limited computing resources. To predict the agent's action, the predictor simulates the agent for N steps. If the agent doesn't finish in N steps, the predictor assumes that the agent will two-box. LPP is similar to the ASP problem, but with simulation instead of theorem proving.

#### 1. Solving the problem when the agent has a halting oracle

Consider the agent defined in "A model of UDT with a halting oracle", and a predictor that can run the agent's code step-by-step, with oracle calls and all. Turns out that this agent solves LPP correctly if N is high enough. To understand why, note that the agent offloads all interesting work to oracles that return instantly, so the agent's own runtime is provably bounded. If that bound is below N, the agent's oracle will prove that the predictor predicts the agent correctly, so the agent will one-box.

#### 2. Failing to solve the problem when N is algorithmically random

Consider a setting without oracles, with only Turing-computable programs. Maybe the agent should successively search for proofs somehow?

Unfortunately you can't solve most LPPs this way, for a simple but surprising reason. Assume that the predictor's time limit N is a large and algorithmically random number. Then the predictor's source code is >log(N) bits long, because N must be defined in the source code. Then any proof about the world program must also have length >log(N), because the proof needs to at least quote the world program itself. Finding a proof by exhaustive search takes exponential time, so the agent will need >N steps. But the predictor simulates the agent for only N steps. Whoops!

#### 3. Solving the problem when N is large but has a short definition

As usual, let U be the world program that returns a utility value, and A be the agent program that returns an action and has access to the world's source code. Consider the following algorithm for A:

- From L=1 to infinity, search for proofs up to length L of the form "if A()=a and runtime(A)<g(L), then U()=u", where g(L) is an upper bound on runtime(A) if A stops the search at length L. Upon finding at least one proof for each possible a, go to step 2.
- Search for proofs up to length f(L) of the form "if runtime(A)<g(L), then A()≠a", where f(L) is some suitably fast-growing function like 10^L. If such a proof is found, return a.
- If we're still here, return the best a found on step 1.

This algorithm is very similar to the one described in "A model of UDT without proof limits", but with the added complication that A is aware of its own runtime via the function g(L). By an analogous argument, A will find the "intended" proof that the predictor predicts A correctly if runtime(A) is small enough, as long as the "intended" proof exists and isn't too long relative to the predictor's time limit N. More concretely, A will solve all instances of LPP in which N is larger than g(L), where L is the length of the "intended" proof. For example, if f(L)=10^L, then g(L) is doubly exponential, so A will successfully solve LPPs where the predictor's source code defines N using triple exponentials or some more compact notation.

#### 4. A broader view

TDT and UDT were originally designed for solving "decision-determined" problems. The agent figures out how the resulting utility logically depends on the agent's action, then returns the action with the highest utility, thus making the premise true.

But a cleverly coded decision program can also control other facts about itself. For example, the program may figure out how the resulting utility depends on the program's return value *and running time*, then choose the best return value *and choose how long to keep running*, thus making both premises true. This idea is a natural extension of quining (you carefully write a program that can correctly judge its own runtime so far) and can be generalized to memory consumption and other properties of programs.

With enough cleverness we could write a program that would sometimes decide to waste time, or run for an even number of clock cycles, etc. We did not need so much cleverness in this post because LPP lies in a smaller class that we may call "LPP-like problems", where utility depends only on the agent's return value and runtime, and the dependence on runtime is monotonous - it never hurts to return the same value earlier. That class also includes all the usual decision-determined problems like Newcomb's Problem, and our A also fares well on those.

I was surprised to find so many new ideas by digging into such a trivial-looking problem as LPP. This makes me suspect that advanced problems like ASP may conceal even more riches, if only we have enough patience to approach them properly...

## A model of UDT without proof limits

This post requires some knowledge of decision theory math. Part of the credit goes to Vladimir Nesov.

Let the universe be a computer program U that returns a utility value, and the agent is a subprogram A within U that knows the source code of both A and U. (The same setting was used in the reduction of "could" post.) Here's a very simple decision problem:

`def U():`

if A() == 1:

return 5

else:

return 10

The algorithm for A will be as follows:

- Search for proofs of statements of the form "A()=a implies U()=u". Upon finding at least one proof for each possible a, go to step 2.
- Let L be the maximum length of proofs found on step 1, and let f(L) be some suitably fast-growing function like 10^L. Search for proofs shorter than f(L) of the form "A()≠a". If such a proof is found, return a.
- If we're still here, return the best a found on step 1.

The usual problem with such proof-searching agents is that they might stumble upon "spurious" proofs, e.g. a proof that A()==2 implies U()==0. If A finds such a proof and returns 1 as a result, the statement A()==2 becomes false, and thus provably false under any formal system; and a false statement implies anything, making the original "spurious" proof correct. The reason for constructing A this particular way is to have a shot at proving that A won't stumble on a "spurious" proof before finding the "intended" ones. The proof goes as follows:

Assume that A finds a "spurious" proof on step 1, e.g. that A()=2 implies U()=0. We have a lower bound on L, the length of that proof: it's likely larger than the length of U's source code, because a proof needs to at least state what's being proved. Then in this simple case 10^L steps is clearly enough to also find the "intended" proof that A()=2 implies U()=10, which combined with the previous proof leads to a similarly short proof that A()≠2, so the agent returns 2. But that can't happen if A's proof system is sound, therefore A will find only "intended" proofs rather than "spurious" ones in the first place.

Quote from Nesov that explains what's going on:

With this algorithm, you're not just passively gauging the proof length, instead you take the first moral argument you come across, and then actively defend it against any close competition

By analogy we can see that A coded with f(L)=10^L will correctly solve all our simple problems like Newcomb's Problem, the symmetric Prisoner's Dilemma, etc. The proof of correctness will rely on the syntactic form of each problem, so the proof may break when you replace U with a logically equivalent program. But that's okay, because "logically equivalent" for programs simply means "returns the same value", and we don't want all world programs that return the same value to be *decision-theoretically* equivalent.

A will fail on problems where "spurious" proofs are exponentially shorter than "intended" proofs (or even shorter, if f(L) is chosen to grow faster). We can probably construct malicious examples of decision-determined problems that would make A fail, but I haven't found any yet.

## Predictability of Decisions and the Diagonal Method

*This post collects a few situations where agents might want to make their decisions either predictable or unpredictable to certain methods of prediction, and considers a method of making a decision unpredictable by "diagonalizing" a hypothetical prediction of that decision. The last section takes a stab at applying this tool to the ASP problem.*

### The diagonal step

To start off, consider the halting problem, interpreted in terms of agents and predictors. Suppose that there is a Universal Predictor, an algorithm that is able to decide whether any given program halts or runs forever. Then, it's easy for a program (agent) to evade its gaze by including a *diagonal step* in its decision procedure: the agent checks (by simulation) if Universal Predictor comes to some decision about the agent, and if it does, the agent acts contrary to the Predictor's decision. This makes the prediction wrong, and Universal Predictors impossible.

The same trick could be performed against something that could exist, normal non-universal Predictors, which allows an agent to make itself immune to their predictions. In particular, ability of other agents to infer decisions of our agent may be thought of as prediction that an agent might want to hinder. This is possible so long as the predictors in question can be simulated in enough detail, that is it's known what they do (what they know) and our agent has enough computational resources to anticipate their hypothetical conclusions. (If an agent does perform the diagonal step with respect to other agents, the predictions of other agents don't necessarily become wrong, as they could be formally correct by construction, but they cease to be possible, which could mean that the predictions won't be made at all.)

## Anthropic Decision Theory VI: Applying ADT to common anthropic problems

A near-final version of my Anthropic Decision Theory paper is available on the arXiv. Since anthropics problems have been discussed quite a bit on this list, I'll be presenting its arguments and results in this and previous posts 1 2 3 4 5 6.

Having presented ADT previously, I'll round off this mini-sequence by showing how it behaves with common anthropic problems, such as the Presumptuous Philosopher, Adam and Eve problem, and the Doomsday argument.

## The Presumptuous Philosopher

The Presumptuous Philosopher was introduced by Nick Bostrom as a way of pointing out the absurdities in SIA. In the setup, the universe either has a trillion observers, or a trillion trillion trillion observers, and physics is indifferent as to which one is correct. Some physicists are preparing to do an experiment to determine the correct universe, until a presumptuous philosopher runs up to them, claiming that his SIA probability makes the larger one nearly certainly the correct one. In fact, he will accept bets at a trillion trillion to one odds that he is in the larger universe, repeatedly defying even strong experimental evidence with his SIA probability correction.

What does ADT have to say about this problem? Implicitly, when the problem is discussed, the philosopher is understood to be selfish towards any putative other copies of himself (similarly, Sleeping Beauty is often implicitly assumed to be selfless, which may explain the diverge of intuitions that people have on the two problems). Are there necessarily other similar copies? Well, in order to use SIA, the philosopher must believe that there is nothing blocking the creation of presumptuous philosophers in the larger universe; for if there was, the odds would shift away from the larger universe (in the extreme case when only one presumptuous philosopher is allowed in any universe, SIA finds them equi-probable). So the expected number of presumptuous philosophers in the larger universe is a trillion trillion times greater than the expected number in the small universe.

## Anthropic Decision Theory V: Linking and ADT

A near-final version of my Anthropic Decision Theory paper is available on the arXiv. Since anthropics problems have been discussed quite a bit on this list, I'll be presenting its arguments and results in this, subsequent, and previous posts 1 2 3 4 5 6.

Now that we've seen what the 'correct' decision is for various Sleeping Beauty Problems, let's see a decision theory that reaches the same conclusions.

## Linked decisions

Identical copies of Sleeping Beauty will make the same decision when faced with same situations (technically true until quantum and chaotic effects cause a divergence between them, but most decision processes will not be sensitive to random noise like this). Similarly, Sleeping Beauty and the random man on the street will make the same decision when confronted with a twenty pound note: they will pick it up. However, while we could say that the first situation is linked, the second is coincidental: were Sleeping Beauty to refrain from picking up the note, the man on the street would not so refrain, while her copy would.

The above statement brings up subtle issues of causality and counterfactuals, a deep philosophical debate. To sidestep it entirely, let us recast the problem in programming terms, seeing the agent's decision process as a deterministic algorithm. If agent α is an agent that follows an automated decision algorithm A, then if A knows its own source code (by quining for instance), it might have a line saying something like:

Module M: If B is another algorithm, belonging to agent β, identical with A ('yourself'), assume A and B will have identical outputs on identical inputs, and base your decision on this.

## Anthropic Decision Theory IV: Solving Selfish and Average-Utilitarian Sleeping Beauty

A near-final version of my Anthropic Decision Theory paper is available on the arXiv. Since anthropics problems have been discussed quite a bit on this list, I'll be presenting its arguments and results in this, subsequent, and previous posts 1 2 3 4 5 6.

In the previous post, I looked at a decision problem when Sleeping Beauty was selfless or a (copy-)total utilitarian. Her behaviour was reminiscent of someone following SIA-type odds. Here I'll look at situations where her behaviour is SSA-like.

**Altruistic average utilitarian Sleeping Beauty**

In the incubator variant, consider the reasoning of an Outside/Total agent who is an average utilitarian (and there are no other agents in the universe apart from the Sleeping Beauties).

"If the various Sleeping Beauties decide to pay £x for the coupon, they will make -£x in the heads world. In the tails world, they will each make £(1-x) each, so an average of £(1-x). This give me an expected utility of £0.5(-x+(1-x))= £(0.5-x), so I would want them to buy the coupon for any price less than £0.5."

And this will then be the behaviour the agents will follow, by consistency. Thus they would be behaving as if they were following SSA odds, and putting equal probability on the heads versus tails world.

## Anthropic Decision Theory III: Solving Selfless and Total Utilitarian Sleeping Beauty

A near-final version of my Anthropic Decision Theory paper is available on the arXiv. Since anthropics problems have been discussed quite a bit on this list, I'll be presenting its arguments and results in this, subsequent, and previous posts 1 2 3 4 5 6.

**Consistency**

In order to transform the Sleeping Beauty problem into a decision problem, assume that every time she is awoken, she is offered a coupon that pays out £1 if the coin fell tails. She must then decide at what cost she is willing to buy that coupon.

The very first axiom is that of temporal consistency. If your preferences are going to predictably change, then someone will be able to exploit this, by selling you something now that they will buy back for more later, or vice versa. This axiom is implicit in the independence axiom in the von Neumann-Morgenstern axioms of expected utility, where non-independent decisions show inconsistency after partially resolving one of the lotteries. For our purposes, we will define it as:

## Anthropic Decision Theory II: Self-Indication, Self-Sampling and decisions

In the last post, we saw the Sleeping Beauty problem, and the question was what probability a recently awoken or created Sleeping Beauty should give to the coin falling heads or tails and it being Monday or Tuesday when she is awakened (or whether she is in Room 1 or 2). There are two main schools of thought on this, the Self-Sampling Assumption and the Self-Indication Assumption, both of which give different probabilities for these events.

## The Self-Sampling Assumption

The self-sampling assumption (SSA) relies on the insight that Sleeping Beauty, before being put to sleep on Sunday, expects that she will be awakened in future. Thus her awakening grants her no extra information, and she should continue to give the same credence to the coin flip being heads as she did before, namely 1/2.

In the case where the coin is tails, there will be two copies of Sleeping Beauty, one on Monday and one on Tuesday, and she will not be able to tell, upon awakening, which copy she is. She should assume that both are equally likely. This leads to SSA:

## Anthropic decision theory I: Sleeping beauty and selflessness

A near-final version of my Anthropic Decision Theory paper is available on the arXiv. Since anthropics problems have been discussed quite a bit on this list, I'll be presenting its arguments and results in this and subsequent posts 1 2 3 4 5 6.

*Many thanks to Nick Bostrom, Wei Dai, Anders Sandberg, Katja Grace, Carl Shulman, Toby Ord, Anna Salamon, Owen Cotton-barratt, and Eliezer Yudkowsky.*

## The Sleeping Beauty problem, and the incubator variant

The Sleeping Beauty problem is a major one in anthropics, and my paper establishes anthropic decision theory (ADT) by a careful analysis it. Therefore we should start with an explanation of what it is.

In the standard setup, Sleeping Beauty is put to sleep on Sunday, and awoken again Monday morning, without being told what day it is. She is put to sleep again at the end of the day. A fair coin was tossed before the experiment began. If that coin showed heads, she is never reawakened. If the coin showed tails, she is fed a one-day amnesia potion (so that she does not remember being awake on Monday) and is reawakened on Tuesday, again without being told what day it is. At the end of Tuesday, she is put to sleep for ever. This is illustrated in the next figure: