Anthropic Decision Theory V: Linking and ADT

Stuart_Armstrong

A near-final version of my Anthropic Decision Theory paper is available on the arXiv. Since anthropics problems have been discussed quite a bit on this list, I'll be presenting its arguments and results in this, subsequent, and previous posts 1 2 3 4 5 6.

Now that we've seen what the 'correct' decision is for various Sleeping Beauty Problems, let's see a decision theory that reaches the same conclusions.

Linked decisions

Identical copies of Sleeping Beauty will make the same decision when faced with same situations (technically true until quantum and chaotic effects cause a divergence between them, but most decision processes will not be sensitive to random noise like this). Similarly, Sleeping Beauty and the random man on the street will make the same decision when confronted with a twenty pound note: they will pick it up. However, while we could say that the first situation is linked, the second is coincidental: were Sleeping Beauty to refrain from picking up the note, the man on the street would not so refrain, while her copy would.

The above statement brings up subtle issues of causality and counterfactuals, a deep philosophical debate. To sidestep it entirely, let us recast the problem in programming terms, seeing the agent's decision process as a deterministic algorithm. If agent α is an agent that follows an automated decision algorithm A, then if A knows its own source code (by quining for instance), it might have a line saying something like:

Module M: If B is another algorithm, belonging to agent β, identical with A ('yourself'), assume A and B will have identical outputs on identical inputs, and base your decision on this.

This could lead, for example, to α and β cooperating in a symmetric Prisoner's Dilemma. And there is no problem with A believing the above assumption, as it is entirely true: identical deterministic algorithms on the same input do produce the same outputs. With this in mind, we give an informal definition of a linked decision as:

Linked decisions: Agent α's decisions are linked with agent β's, if both can prove they will both make the same decision, even after taking into account the fact they know they are linked.

An example of agents that are not linked would be two agents α and β, running identical algorithms A and B on identical data, except that A has module M while B doesn't. Then A's module might correctly deduce that they will output the same decision, but only if A disregards the difference between them, i.e.module M. So A can 'know' they will output the same decision, but if it acts on that knowledge, it makes it incorrect. If A and B both had module M, then they could both act on the knowledge and it would remain correct.

ADT

Given the above definition, anthropic decision theory (ADT) can be simply stated as:

Anthropic Decision Theory (ADT): An agent should first find all the decisions linked with their own. Then they should maximise expected utility, acting as if they simultaneously controlled the outcomes of all linked decisions, and using the objective (non-anthropic) probabilities of the various worlds.

ADT is similar to SSA in that it makes use of reference classes. However, SSA needs to have the reference class information established separately before it can calculate probabilities, and different reference classes give very different results. In contrast, the reference class for ADT is part of the definition. It is not the class of identical or similar agents; instead, it is the class of linked decisions which (by definition) is the class of decisions that the agent can prove are linked. Hence the whole procedure is perfectly deterministic, and known for a given agent.

It can be seen that ADT obeys all the axioms in the Sleeping Beauty problems, so must reach the same conclusions as there.

Linking non-identical agents

Now, module M is enough when the agents/algorithms are strictly identical, but fails when they differ slightly. For instance, imagine a variant of the selfless Sleeping Beauty problem where the two agents aren't exactly identical in tails world. The first agent has the same utility as before, while the second agent has some personal displeasure in engaging in trade -- if she buys the coupon, she will suffer a single -£0.05 penalty for doing so.

Then if the coupon is priced at £0.60, something quite interesting happens. If the agents do not believe they are linked, they will refuse the offer: their expected returns are 0.5(-0.6 + (1-0.6)) = -0.1 and -0.1-0.05=-0.15 respectively. If however they believe their decisions are linked, they will calculate the expected return from buying the coupon as 0.5 (-0.60 + 2(1-0.60)) = 0.1 and 0.1-0.05 = 0.05 respectively. Since these are positive, they will buy the coupon: meaning their assumption that they were linked was actually correct!

If the coupon is priced at £0.66, things change. If the two agents assume their decisions are linked, then they will calculate their expected return from buying the coupon as 0.5(-0.66 + 2(1-0.66))= 0.01 and 0.01-0.05=-0.04 respectively. The first agent will buy, and the second will not -- they were wrong to assume they were linked

A more general module that gives this kind of behaviour is:

Module N: Let H be the hypothesis that the decision of A ('myself') and those of algorithm B are linked. I will then compute what each of us will decide if we were both to accept H. If our ultimate decisions are indeed the same, and if the other agent also has a module N, then I will accept H.

The module N gives correct behaviour. It only triggers if the agents can prove that accepting H will ensure that H is true -- and then N makes them accept H, hence making H true.

For the coupon priced at £0.60, it will correctly tell them they are linked, and they will both buy it. For the coupon priced at £0.66, it will not trigger, and both will refuse to buy it -- though they reach the same decision, they will not have done so if they had assumed they were linked. For a coupon priced above £2/3, module N will correctly tell them are linked again, and they will both refuse to buy it.

Regarding:

Anthropic Decision Theory (ADT): An agent should first find all the decisions linked with their own. Then they should maximise expected utility, acting as if they simultaneously controlled the outcomes of all linked decisions, and using the objective (non-anthropic) probabilities of the various worlds.

...is there any difference from Yudkowsky (2010) (appended below)...?

The timeless decision procedure evaluates expected utility conditional upon the output of an abstract decision computation - the very same computation that is currently executing as a timeless decision procedure - and returns that output such that the universe will possess maximum expected utility, conditional upon the abstract computation returning that output.

It's all related, but here deployed for the first time in Anthropic reasoning.

The other problem I see with this kind of material is that it seems kinda obvious. It basically says to maximise expected utility - with the reminder that identical deterministic calculations in different places should return the same outcome. However, most people already know that identical deterministic calculations performed in different places should return the same outcome - that's just uniformitarianism - something which is often taken for granted. Reminders of what we already know are OK - but they don't always add very much.

The other problem I see with this kind of material is that it seems kinda obvious

Then I've succeeded in my presentation. Nobody was saying what I was saying about anthropic behaviour until I started talking about it; if now it's kinda obvious, then that's great.

For instance, imagine a variant of the selfless Sleeping Beauty problem where the two agents aren't exactly identical in tails world. The first agent has the same utility as before, while the second agent has some personal displeasure in engaging in trade -- if she buys the coupon, she will suffer a single -£0.05 penalty for doing so.

And what happens in the heads world? I really can't figure out what you're doing here; it seems like you're assuming that both agents think that they would be guaranteed to exist in the heads world if they're not linked in the tails world, but that they only have a .5 chance of existing in the heads world if they are linked in the tails world.

This is how I see it: there are two agents A and A'. In the tails world, we have both, in the heads world, we have either A or A' with 50% probability.

In that case, I think you did your math wrong. Assume you are agent A. There is a .25 chance that the coin will land heads and that you will exist, a .25 chance that the coin will land heads and that you will not exist, and a .5 chance that the coin will land tails and you will exist. Updating on the fact that you do exist, you conclude that there is a 1/3 chance that the coin landed heads. Same goes for A'. Thus, the expected value of buying the coupon at £0.60 should be (1/3)(-0.6) + (2/3)(1-0.6) = +.07 and .07-.05 = +.02, respectively, and the expected value of buying the coupon at £0.66 is +.01 and -.04, respectively.

Updating on the fact that you do exist

This is precisely the sort of thing I don't want to do, as anthropic probabilities are not agreed upon. For instance:

There is a .25 chance that the coin will land heads and that you will exist, a .25 chance that the coin will land heads and that you will not exist, and a .5 chance that the coin will land tails and you will exist.

Replace the "you will exist" with "A will exist", and rewrite the ending to be "and a .5 chance that the coin will land tails and A will exist. Thus a .25 chance that the coin will land tails, A will exist, and you will be A. (in the heads world, A exists -> you are A)" But is this the right way to reason?

It's because questions like that are so confused that I used this approach.

Ah, so you're considering A and A' to be part of the same reference class in SSA.

It's because questions like that are so confused that I used this approach.

I can't even figure out what your approach is. How are you justifying these calculations (which I've fixed in the quote, I think. At least, if you actually wanted to do what you originally wrote instead, you have even more explaining to do.)?

Then if the coupon is priced at £0.60, something quite interesting happens. If the agents do not believe they are linked, they will refuse the offer: their expected returns are 0.5(-0.6 + (1-0.6)) = -0.1 and -0.1-0.05=-0.15 respectively. If however they believe their decisions are linked, they will calculate the expected return from buying the coupon as 0.33 (-0.60 + 2(1-0.60)) = 0.067 and 0.067-0.05 = 0.017 respectively.

Ah, so you're considering A and A' to be part of the same reference class in SSA.

I could be. The whole mess of reference classes is one of the problems in SSA.

As for the calculations: assume you are linked, so you and the other agent will make the same decision (if there is another agent). If you do not have the penalty from trade, "buy the coupon for 0.60" nets you -0.6 in the heads world, nets you personally 1-0.4 in the tails world, and nets the other agent in the tails world 1-0.4 (you do not care about their pain from trading, since you are selfless, not altruistic). You both are selfless and in agreement with this money, so the cash adds up in the tails world: 2(1-0.4). Then plugging in probabilities gives 0.067.

If you have the penalty from trade, simply subtract it from all your gains (again, the penalty from trade is only yours, and is not shared).

If you assume you are not linked, then you do not claim the extra 1-0.4 of the other agent as being part of your achievement, so simply get 0.5(-0.6 + (1-0.6)) = -0.1, plus the penalty from trade.

Oh, I see. So you are assuming these utility functions:

A: sum of profits for all copies of A or A', not counting (A')'s trade penalty.

A': sum of profits for all copies of A or A', minus .05 if this particular agent trades.

Now that I know what you meant, I can even tell that your original text implies these utility functions, but it would have helped if you had been more explicit. I had jumped to the conclusion that both agents were selfish when I noticed that A did not take (A')'s trade penalty into account. Anyway, your original calculation appears to be correct using ADT and those utility functions, so you can disregard my attempted corrections. I'm assuming that when you said 1-0.4 in your reply, you meant 1-0.6.

I can even tell that your original text implies these utility functions, but it would have helped if you had been more explicit.

Thank you, that is a very useful comment, I will try and clarify in the rewrite.

Regarding:

Anthropic Decision Theory (ADT): An agent should first find all the decisions linked with their own. Then they should maximise expected utility, acting as if they simultaneously controlled the outcomes of all linked decisions, and using the objective (non-anthropic) probabilities of the various worlds.

...is there any difference from Yudkowsky (2010) (appended below)...?

The timeless decision procedure evaluates expected utility conditional upon the output of an abstract decision computation - the very same computation that is currently executing as a timeless decision procedure - and returns that output such that the universe will possess maximum expected utility, conditional upon the abstract computation returning that output.

It's all related, but here deployed for the first time in Anthropic reasoning.

The other problem I see with this kind of material is that it seems kinda obvious

Then I've succeeded in my presentation. Nobody was saying what I was saying about anthropic behaviour until I started talking about it; if now it's kinda obvious, then that's great.

For instance, imagine a variant of the selfless Sleeping Beauty problem where the two agents aren't exactly identical in tails world. The first agent has the same utility as before, while the second agent has some personal displeasure in engaging in trade -- if she buys the coupon, she will suffer a single -£0.05 penalty for doing so.

This is how I see it: there are two agents A and A'. In the tails world, we have both, in the heads world, we have either A or A' with 50% probability.

Updating on the fact that you do exist

This is precisely the sort of thing I don't want to do, as anthropic probabilities are not agreed upon. For instance:

There is a .25 chance that the coin will land heads and that you will exist, a .25 chance that the coin will land heads and that you will not exist, and a .5 chance that the coin will land tails and you will exist.

It's because questions like that are so confused that I used this approach.

Ah, so you're considering A and A' to be part of the same reference class in SSA.

It's because questions like that are so confused that I used this approach.

Then if the coupon is priced at £0.60, something quite interesting happens. If the agents do not believe they are linked, they will refuse the offer: their expected returns are 0.5(-0.6 + (1-0.6)) = -0.1 and -0.1-0.05=-0.15 respectively. If however they believe their decisions are linked, they will calculate the expected return from buying the coupon as 0.33 (-0.60 + 2(1-0.60)) = 0.067 and 0.067-0.05 = 0.017 respectively.

Ah, so you're considering A and A' to be part of the same reference class in SSA.

I could be. The whole mess of reference classes is one of the problems in SSA.

If you have the penalty from trade, simply subtract it from all your gains (again, the penalty from trade is only yours, and is not shared).

If you assume you are not linked, then you do not claim the extra 1-0.4 of the other agent as being part of your achievement, so simply get 0.5(-0.6 + (1-0.6)) = -0.1, plus the penalty from trade.

Oh, I see. So you are assuming these utility functions:

A: sum of profits for all copies of A or A', not counting (A')'s trade penalty.

A': sum of profits for all copies of A or A', minus .05 if this particular agent trades.

I can even tell that your original text implies these utility functions, but it would have helped if you had been more explicit.

Thank you, that is a very useful comment, I will try and clarify in the rewrite.