Comment author: Gurkenglas 08 December 2014 04:25:44PM *  0 points [-]

If I am choosing the algorithm that all civilisations are going to follow, if one civilisation succeeded that would lead to large positive utilities for all future civilisations. Why would I let the game end?

Comment author: Beluga 08 December 2014 10:56:08PM 1 point [-]

Not sure I understand your question, but:

  • I assume that each civilization only cares about itself. So one civilization succeeding does not "lead to large positive utilities for all future civilisations", only for itself. If civilization A assigns positive or negative value to civilization B succeeding, the expected utility calculations become more complicated.
  • You cannot "let the game end". The fact that the game ends when one player receives R only represents the fact that each player knows that no previous player has received R (i.e., we arguably know that no civilization so far has successfully colonized space in our neighborhood).
Comment author: NancyLebovitz 07 December 2014 03:39:57PM 3 points [-]

If a player pushes the button and receives R, the game is immediately aborted, while the game continues if a player receives R.

I think you have a typo-- it should be "if a player receives P".

Comment author: Beluga 08 December 2014 12:46:28AM 0 points [-]

Thanks, I fixed it.

Linked decisions an a "nice" solution for the Fermi paradox

2 Beluga 07 December 2014 02:58PM

One of the more speculative solutions of the Fermi paradox is that all civilizations decide to stay home, thereby meta-cause other civilizations to stay home too, and thus allow the Fermi paradox to have a nice solution. (I remember reading this idea in Paul Almond’s writings about evidential decision theory, which unfortunately seem no longer available online.) The plausibility of this argument is definitely questionable. It requires a very high degree of goal convergence both within and among different civilizations. Let us grant this convergence and assume that, indeed, most civilizations arrive at the same decision and that they make their decision knowing this. One paradoxical implication then is: If a civilization decides to attempt space colonization, they are virtually guaranteed to face unexpected difficulties (for otherwise space would already be colonized, unless they are the first civilization in their neighborhood attempting space colonization). If, on the other hand, everyone decides to stay home, there is no reason for thinking that there would be any unexpected difficulties if one tried. Space colonization can either be easy, or you can try it, but not both.

Can the basic idea behind the argument be formalized? Consider the following game: There are N>>1 players. Each player is offered to push a button in turn. Pushing the button yields a reward R>0 with probability p and a punishment P<0 otherwise. (R corresponds to successful space colonization while P corresponds to a failed colonization attempt.) Not pushing the button gives zero utility. If a player pushes the button and receives R, the game is immediately aborted, while the game continues if a player receives P. Players do not know how many other players were offered to push the button before them, they only know that no player before them received R. Players also don’t know p. Instead, they have a probability distribution u(p) over possible values of p. (u(p)>=0 and the integral of u(p) from 0 to 1 is given by int_{0}^{1}u(p)dp=1.) We also assume that the decisions of the different players are perfectly linked.

Naively, it seems that players simply have an effective success probability p_eff,1=int_{0}^{1}p*u(p)dp and they should push the button iff p_eff,1*R+(1-p_eff,1)*P>0. Indeed, if players decide not to push the button they should expect that pushing the button would have given them R with probability p_eff,1. The situation becomes more complicated if a player decides to push the button. If a player pushes the button, they know that all players before them have also pushed the button and have received P. Before taking this knowledge into account, players are completely ignorant about the number i of players who were offered to push the button before them, and have to assign each number i from 0 to N-1 the same probability 1/N. Taking into account that all players before them have received P, the variables i and p become correlated: the larger i, the higher the probability of a small value of p. Formally, the joint probability distribution w(i,p) for the two variables is, according to Bayes’ theorem, given by w(i,p)=c*u(p)*(1-p)^i, where c is a normalization constant. The marginal distribution w(p) is given by w(p)=sum_{i=0}^{N-1}w(i,p). Using N>>1, we find w(p)=c*u(p)/p. The normalization constant is thus c=[int_{0}^{1}u(p)/p*dp]^{-1}. Finally, we find that the effective success probability taking the linkage of decisions into account is given by

p_eff,2 = int_{0}^{1}p*w(p)dp = c = [int_{0}^{1}u(p)/p*dp]^{-1} .

This is the expected chance of success if players decide to push the button. Players should push the button iff p_eff,2*R+(1-p_eff,2)*P>0. If follows from convexity of the function x->1/x (for positive x) that p_eff,2<=p_eff,1. So by deciding to push the button, players decrease their expected success probability from p_eff,1 to p_eff,2; they cannot both push the button and have the unaltered success probability p_eff,1. Linked decisions can explain why no one pushes the button if p_eff,2*R+(1-p_eff,2)*P<0, even though we might have p_eff,1*R+(1-p_eff,1)*P>0 and pushing the button naively seems to have positive expected utility.

It is also worth noting that if u(0)>0, the integral int_{0}^{1}u(p)/p*dp diverges such that we have p_eff,2=0. This means that given perfectly linked decisions and a sufficiently large number of players N>>1, players should never push the button if their distribution u(p) satisfies u(0)>0, irrespective of the ratio of R and P. This is due to an observer selection effect: If a player decides to push the button, then the fact that they are even offered to push the button is most likely due to p being very small and thus a lot of players being offered to push the button.

Comment author: lackofcheese 24 October 2014 03:52:51PM *  2 points [-]

OK, time for further detail on the problem with pre-emptively submissive gnomes. Let's focus on the case of total utilitarianism, and begin by looking at the decision in unlinked form, i.e. we assume that the gnome's advice affects only one human if there is one in the room, and zero humans otherwise. Conditional on there being a human in cell B, the expected utility of the human in cell B buying a ticket for $x is, indeed, (1/3)(-x) + (2/3)(1-x) = 2/3 - x, so the breakeven is obviously at x = 2/3. However, if we also assume that the gnome in the other cell will give the same advice, we get (1/3)(-x) + 2(2/3)(1-x) = 4/3 - (5/3)x, with breakeven at x=4/5. In actual fact, the gnome's reasoning, and the 4/5 answer, is correct. If tickets were being offered at a price of, say, 75 cents, then the overall outcome (conditional on there being a human in cell B) is indeed better if the humans buy at 75 cents than if they refuse to buy at 75 cents, because 3/4 is less than 4/5.

As I mentioned previously, in the case where the gnome only cares about total $ if there is a human in its cell, then 4/5 is correct before conditioning on the presence of a human, and it's also correct after conditioning on the presence of a human; the number is 4/5 regardless. However, the situation we're examining here is different, because the gnome cares about total $ even if no human is present. Thus we have a dilemma, because it appears that UDT is correct in advising the gnome to precommit to 2/3, but the above argument also suggests that after seeing a human in its cell it is correct for the gnome to advise 4/5.

The key distinction, analogously to mwenger's answer to Psy-Kosh's non-anthropic problem, has to do with the possibility of a gnome in an empty cell. For a total utilitarian gnome in an empty cell, any money at all spent in the other cell translates directly into negative utility. That gnome would prefer the human in the other cell to spend $0 at most, but of course there is no way to make this happen, since the other gnome has no way of knowing that this is the case.

The resolution to this problem is that, for linked decisions, you must (as UDT does) necessarily consider the effects of that decision over all a priori possible worlds affected by that decision. As it happens, this is the same thing as what you would do if you had the opportunity to precommit in advance.

It's a bit trickier to justify why this should be the case, but the best argument I can come up with is to apply that same "linked decision" reasoning at one meta-level up, the level of "linked decision theories". In short, by adopting a decision theory that ignores linked decisions in a priori possible worlds that are excluded by your observations, you are licensing yourself and other agents to do the same thing in future decisions, which you don't want. If other agents follow this reasoning, they will give the "yea" answer in Psy-Kosh's non-anthropic problem, but you don't want them to do that.

Note that most of the time, decisions in worlds excluded by your observations do not usually tend to be "linked". This is because exclusion by observation would usually imply that you receive a different observation in the other possible world, thus allowing you to condition your decision on that observation, and thereby unlinking the decisions. However, some rare problems like the Counterfactual Mugging and Psy-Kosh's non-anthropic problem violate this tendency, and should therefore be treated differently.

Overall, then, the "linked decision theory" argument supports adopting UDT, and it means that you should consider all linked decisions in all a priori possible worlds.

Comment author: Beluga 25 October 2014 05:45:55PM *  1 point [-]

Thanks a lot for your comments, they were very insightful for me. Let me play the Advocatus Diaboli here and argue from the perspective of a selfish agent against your reasoning (and thus also against my own, less refined version of it).

"I object to the identification 'S = $B'. I do not care about the money owned by the person in cell B, I only do so if that person is me. I do not know whether the coin has come up heads or tails, but I do not care about how much money the other person that may have been in cell B had the coin come up differently would have paid or won. I only care about the money owned by the person in cell B in "this world", where that person is me. I reject identifying myself with the other person that may have been in cell B had the coin come up differently, solely because that person would exist in the same cell as I do. My utility function thus cannot be expressed as a linear combination of $B and $C.

I would pay a counterfactual mugger. In that case, there is a transfer, as it were, between two possible selfes of mine that increases "our" total fortune. We are both both possible descendants of the same past-self, to which each of us is connected identically. The situation is quite different in the incubator case. There is no connection over a mutual past self between me and the other person that may have existed in cell B after a different outcome of the coin flip. This connection between past and future selves of mine is exactly what specifies my selfish goals. Actually, I don't feel like the person that may have existed in cell B after a different outcome of the coin flip is "me" any more than the person in cell C is "me" (if that person exists). Since I will pay and win as much as the person in cell C (if they exist), I cannot win any money from them, and I don't care about whether they exist at all, I think I should decide as an average utilitarian would. I will not pay more than $0.50."

Is the egoist arguing this way mistaken? Or is our everyday notion of selfishness just not uniquely defined when it comes to the possibility of subjectively indistinguishable agents living in different "worlds", since it rests on the dubious concept of personal identity? Can one understand selfishness both as caring about everyone living in subjectively identical circumstances as oneself (and their future selves), and as caring about everyone to whom one is directly connected only? Do these two possibilities correspond to SIA-egoists and SSA-egoists, respectively, which are both coherent possibilities?

Comment author: Stuart_Armstrong 23 October 2014 11:40:41AM *  2 points [-]

A minimal non-anthropic example that illustrates the difference

The decision you describe in not stable under pre-commitments. Ahead of time, all agents would pre-commit to the $2/3. Yet they seem to change their mind when presented with the decision. You seem to be double counting, using the Bayesian updating once and the fact that their own decision is responsible for the other agent's decision as well.

In the terminology of paper http://www.fhi.ox.ac.uk/anthropics-why-probability-isnt-enough.pdf , your agents are altruists using linked decisions with total responsibility and no precommitments, which is a foolish thing to do. If they were altruists using linked decisions with divided responsibility (or if they used precommitments), everything would be fine (I don't like or use that old terminology - UDT does it better - but it seems relevant here).

But that's detracting from the main point: still don't see any difference between indexical and non-indexical total utilitarianism. I don't see why a non-indexical total utilitarian can't follow the wrong reasoning you used in your example just as well as an indexical one, if either of them can - and similarly for the right reasoning.

Comment author: Beluga 24 October 2014 03:47:55PM 0 points [-]

The decision you describe in not stable under pre-commitments. Ahead of time, all agents would pre-commit to the $2/3. Yet they seem to change their mind when presented with the decision. You seem to be double counting, using the Bayesian updating once and the fact that their own decision is responsible for the other agent's decision as well.

Yes, this is exactly the point I was trying to make -- I was pointing out a fallacy. I never intended "lexicality-dependent utilitarianism" to be a meaningful concept, it's only a name for thinking in this fallacious way.

Comment author: Stuart_Armstrong 22 October 2014 05:30:11PM 1 point [-]

I'm still not clear why lexicality-independent utility functions are different from their equivalent indexical versions.

Comment author: Beluga 22 October 2014 08:02:29PM 1 point [-]

I elaborated on this difference here. However, I don't think this difference is relevant for my parent comment. With indexical utility functions I simply mean selfishness or "selfishness plus hating the other person if another person exists", while with lexicality-independent utility functions I meant total and average utilitarianism.

Comment author: Stuart_Armstrong 22 October 2014 02:46:06PM 1 point [-]

The broader question is "does bringing in gnomes in this way leave the initial situation invariant"? And I don't think it does. The gnomes follow their own anthropic setup (though not their own preferences), and their advice seems to reflect this fact (consider what happens when the heads world has 1, 2 or 50 gnomes, while the tails world has 2).

I also don't see your indexical objection. The sleeping beauty could perfectly have an indexical version of total utilitarianism ("I value my personal utility, plus that of the sleeping beauty in the other room, if they exist"). If you want to proceed further, you seem to have to argue that indexical total utilitarianism gives different decisions than standard total utilitarianism.

This is odd, as it seems a total utilitarian would not object to having their utility replaced with the indexical version, and vice-versa.

Comment author: Beluga 22 October 2014 07:53:33PM *  1 point [-]

The broader question is "does bringing in gnomes in this way leave the initial situation invariant"? And I don't think it does. The gnomes follow their own anthropic setup (though not their own preferences), and their advice seems to reflect this fact (consider what happens when the heads world has 1, 2 or 50 gnomes, while the tails world has 2).

As I wrote (after your comment) here, I think it is prima facie very plausible for a selfish agent to follow the gnome's advice if a) conditional on the agent existing, the gnome's utility function agrees with the agent's and b) conditional on the agent not existing, the gnome's utility function is a constant. (I didn't have condition b) explicitly in mind, but your example showed that it's necessary.) Having the number of gnomes depend upon the coin flip invalidates their purpose. The very point of the gnomes is that from their perspective, the problem is not "anthropic", but a decision problem that can be solved using UDT.

I also don't see your indexical objection. The sleeping beauty could perfectly have an indexical version of total utilitarianism ("I value my personal utility, plus that of the sleeping beauty in the other room, if they exist"). If you want to proceed further, you seem to have to argue that indexical total utilitarianism gives different decisions than standard total utilitarianism.

That's what I tried in the parent comment. To be clear, I did not mean "indexical total utilitarianism" to be a meaningful concept, but rather a wrong way of thinking, a trap one can fall into. Very roughly, it corresponds to thinking of total utilitarianism as "I care for myself plus any other people that might exist" instead of "I care for all people that exist". What's the difference, you ask? A minimal non-anthropic example that illustrates the difference would be very much like the incubator, but without people being created. Imagine 1000 total utilitarians with identical decision algorithms waiting in separate rooms. After the coin flip, either one or two of them are offered to buy a ticket that pays $1 after heads. When being asked, the agents can correctly perform a non-anthropic Bayesian update to conclude that the probability of tails is 2/3. An indexical total utilitarian reasons: "If the coin has shown tails, another agent will pay the same amount $x that I pay and win the same $1, while if the coin has shown heads, I'm the only one who pays $x. The expected utility of paying $x is thus 1/3 * (-x) + 2/3 * 2 * (1-x)." This leads to the incorrect conclusion that one should pay up to $4/5. The correct (UDT-) way to think about the problem is that after tails, one's decision algorithm is called twice. There's only one factor of 2, not two of them. This is all very similar to this post.

To put this again into context: You argued that selfishness is a 50/50 mixture of hating the other person, if another person exists, and total utilitarianism. My reply was that this is only true if one understands total utilitarianism in the incorrect, indexical way. I formalized this as follows: Let the utility function of a hater be vh - h * vo (here, vh is the agent's own utility, vo the other person's utility, and h is 1 if the other person exists and 0 otherwise). Selfishness would be a 50/50 mixture of hating and total utilitarianism if the utility function of a total utilitarian were vh + h * vo. However, this is exactly the wrong way of formalizing total utilitarianism. It leads, again, to the conclusion that a total utilitarian should pay up to $4/5.

Comment author: Stuart_Armstrong 22 October 2014 12:55:51PM 1 point [-]

Right now lets modify the setup a bit, targeting that one vulnerable gnome who sees no human in the heads world.

First scenario: there is no such gnome. The number of gnomes is also determined by the coin flip, so every gnome will see a human. Then if we apply the reasoning from http://lesswrong.com/r/discussion/lw/l58/anthropic_decision_theory_for_selfish_agents/bhj7 , this will result with a gnome with a selfish human agreeing to x<$1/2.

Instead, let's now make the gnome in the head world hate the other human, if they don't have one themselves. The result of this is that they will agree to any x<$1, as they are (initially) indifferent to what happens in the heads world (potential gains, if they are the gnome with a human, as cancelled out by the potential loss, if they are the gnome without the human).

So it seems to me that the situation is most likely an artefact of the number and particular motivations of the gnomes (notice I never changed the motivations of gnomes who would encounter a human, only the "unimportant extra" one).

Comment author: Beluga 22 October 2014 03:47:02PM *  1 point [-]

First scenario: there is no such gnome. The number of gnomes is also determined by the coin flip, so every gnome will see a human. Then if we apply the reasoning from http://lesswrong.com/r/discussion/lw/l58/anthropic_decision_theory_for_selfish_agents/bhj7 , this will result with a gnome with a selfish human agreeing to x<$1/2.

If the gnomes are created after the coin flip only, they are in exactly the same situation like the humans and we cannot learn anything by considering them that we cannot learn from considering the humans alone.

Instead, let's now make the gnome in the head world hate the other human, if they don't have one themselves. The result of this is that they will agree to any x<$1, as they are (initially) indifferent to what happens in the heads world (potential gains, if they are the gnome with a human, as cancelled out by the potential loss, if they are the gnome without the human).

What this shows is that "Conditional on me existing, the gnome's utility function coincides with mine" is not a sufficient condition for "I should follow the advice that the gnome would have precommited to give".

What I propose is instead: "If conditional on me existing the gnome's utility function coincides with mine, and conditional on me not existing the gnome's utility function is a constant, then I should follow the advice that the gnome would have precommited to."

ETA: Speaking of indexicality-dependent utility functions here. For lexicality-independent utility functions, such as total or average utilitarianism, the principle simplifies to: "If the gnome's utility function coincides with mine, then I should follow the advice that the gnome would have precommited to."

Comment author: Stuart_Armstrong 21 October 2014 09:10:13PM 2 points [-]

Ok, I don't like gnomes making current decisions based on their future values. Let's make it simpler: the gnomes have a utility function linear in the money owned by person X. Person X will be the person who appears in their (the gnome's) room, or, if no-one appeared, some other entity irrelevant to the experiment.

So now the gnomes have subjectively indistinguishable utility functions, and know they will reach the same decision upon seeing "their" human. What should this decision be?

If they advise "buy the ticket for price $x", then they expect to lose $x with probability 1/4 (heads world, they see a human), lose/gain nothing with probability 1/4 (heads world, they don't see a human), and gain $1-x with probability 1/2 (tails world). So this gives an expected gain of 1/2-(3/4)x, which is zero for x=$2/3.

So this seems to confirm your point.

"Not so fast!" shouts a voice in the back of my head. That second head-world gnome, the one who never sees a human, is a strange one. If this model is vulnerable, it's there.

So let's do without gnomes for a second. The incubator always creates two people, but in the heads world, the second person can never gain (nor lose) anything, no matter what they agree to: any deal is nullified. This seems a gnome setup without the gnomes. If everyone is an average utilitarian, then they will behave exactly as the total utilitarians would (since population is equal anyway) and buy the ticket for x<$2/3. So this setup has changed the outcome for average utilitarians. If its the same as the gnome setup (and it seems to be) then the gnome setup is interfering with the decisions in cases we know about. The fact that the number of gnomes is fixed is the likely cause.

I'll think more about it, and post tomorrow. Incidentally, one reason for the selfish=average utilitarian is that I sometimes model selfish as the average between total utilitarian incubator and anti-incubator (where the two copies hate each other in the tail world). 50%-50% on total utilitarian vs hatred seems to be a good model of selfishness, and gives the x<$1/2 answer.

Comment author: Beluga 22 October 2014 01:58:11PM *  2 points [-]

Thanks for your reply.

Ok, I don't like gnomes making current decisions based on their future values.

For the selfish case, we can easily get around this by defining the gnome's utility function to be the amount of $ in the cell. If we stipulate that this can only change through humans buying lottery tickets (and winning lotteries) and that humans cannot leave the cells, the gnome's utility function coincides with the human's. Similarly, we can define the gnome's utility function to be the amount of $ in all cells (the average amount of $ in those cells inhabited by humans) in the total (average) utilitarian case.

This seems to be a much neater way of using the gnome heuristic than the one I used in the original post, since the gnome's utility function is now unchanging and unconditional. The only issue seems to be that before the humans are created, the gnome's utility function is undefined in the average utilitarian case ("0/0"). However, this is more a problem of average utilitarianism than of the heuristic per se. We can get around it by defining the utility to be 0 if there aren't any humans around yet.

The incubator always creates two people, but in the heads world, the second person can never gain (nor lose) anything, no matter what they agree to: any deal is nullified. This seems a gnome setup without the gnomes. If everyone is an average utilitarian, then they will behave exactly as the total utilitarians would (since population is equal anyway) and buy the ticket for x<$2/3. So this setup has changed the outcome for average utilitarians. If its the same as the gnome setup (and it seems to be) then the gnome setup is interfering with the decisions in cases we know about. The fact that the number of gnomes is fixed is the likely cause.

I don't follow. As I should have written in the original post, total/average utilitarianism includes of course the wellbeing and population of humans only, not of gnomes. Otherwise, it's trivial that the presence of gnomes affects the conclusions. That the presence of an additional human affects the conclusion for average utilitarians is not surprising, since in contrast to the presence of gnomes, an additional human changes the relevant population.

Incidentally, one reason for the selfish=average utilitarian is that I sometimes model selfish as the average between total utilitarian incubator and anti-incubator (where the two copies hate each other in the tail world). 50%-50% on total utilitarian vs hatred seems to be a good model of selfishness, and gives the x<$1/2 answer.

Hm, so basically one could argue as follows against my conclusion that both selfish and total utilitarians pay up to $2/3: A hater wouldn't pay anything for a ticket that pays $1 in the tails world. Since selfishness is a mixture of total utilitarianism and hating, a selfish person certainly cannot have the same maximal price as a total utilitarian.

However, I feel like "caring about the other person in the tail world in a total utilitarian sense" and "hating the other person in the tail world" are not exactly mirror images of each other. The difference is that total utilitarianism is lexicality-independent, while "hating the other person" isn't. My claim is: However you formalize "hating the person in the other room in the tail world" and "being a total utilitarian", the statements "a total utilitarian pays up to $2/3" and "selfishness is a mixture of total utilitarianism and hating" and "a hater would not pay more than $0 for the ticket" are never simultaneously true.

Imagine that the human formally writes down their utility function in order to apply the "if there were a gnome in my room, what maximal prize to pay would it advise me after asking itself what advice it would have precommited to?" heuristic. We introduce the variables 'vh' and 'vo' for "$-value in this/the other room". These are 0 if there's no human, -x after buying a ticket after head, and 1-x after buying a ticket after tail. We also introduce a variable 't' which is 1 after tail and 0 after head.

We can then write down the following utility functions with their respective expectation values (from the point of view of the gnome before the coin flip):

egoist: vh => 1/4 * (-x+0+(1-x)+(1-x))

total ut.: vh + t * vo => 1/4 * (-x+0+2 * (1-x)+2 * (1-x))

hate: vh - t * vo => 1/4 * (-x+0+0+0)

Here, we see that egoism is indeed a mixture of total utilitarianism and hating, the egoist pays up to 2/3, and the hater pays nothing. However, according to this definition of total utilitarianism, a t.u. should pay up to 4/5. Its utility function is lexicality-dependent (the variable t enters only the utility coming from the other person), in contrast to true total utilitarianism.

In order to write down a lexicality-independent utility function, we introduce new variables 'nh' and 'no', the number of people here and in the other room (0 or 1). Then, we could make the following definitions:

egoist: nh * vh
total ut.: nh * vh + no * vo
hate: nh * vh - no * vo

(The 'nh' and 'no' factors are actually redundant, since 'vh' is defined to be zero if 'nh' is.)

With these definitions, both an egoist and a t.u. pay up to 2/3 and egoism is a mixture of t.u. and hating. However, the expected utility of a hater is now 0 independent of x, such that there is no longer a contradiction. The reason is that we now count the winnings of the single head-human one time positively (if ze is in our room) and one time negatively (if ze is in the other room). This isn't what we meant by hating, so we could modify the utility function of the hater as follows:

hate: nh * (vh - no * vo)

This reproduces again what we mean by hating (it is equivalent to the old definition 'vh - t * vo'), but now egoism is no longer a combination of hating and t.u..

In conclusion, it doesn't seem to be possible to derive a contradiction between "a hater wouldn't pay anything for a lottery ticket" and "both egoists and total utilitarians would pay up to $2/3".

Anthropic decision theory for selfish agents

8 Beluga 21 October 2014 03:56PM

Consider Nick Bostrom's Incubator Gedankenexperiment, phrased as a decision problem. In my mind, this provides the purest and simplest example of a non-trivial anthropic decision problem. In an otherwise empty world, the Incubator flips a coin. If the coin comes up heads, it creates one human, while if the coin comes up tails, it creates two humans. Each created human is put into one of two indistinguishable cells, and there's no way for created humans to tell whether another human has been created or not. Each created human is offered the possibility to buy a lottery ticket which pays 1$ if the coin has shown tails. What is the maximal price that you would pay for such a lottery ticket? (Utility is proportional to Dollars.) The two traditional answers are 1/2$ and 2/3$.

We can try to answer this question for agents with different utility functions: total utilitarians; average utilitarians; and selfish agents. UDT's answer is that total utilitarians should pay up to 2/3$, while average utilitarians should pay up to 1/2$; see Stuart Armstrong's paper and Wei Dai's comment. There are some heuristic ways to arrive at UDT prescpriptions, such as asking "What would I have precommited to?" or arguing based on reflective consistency. For example, a CDT agent that expects to face Counterfactual Mugging-like situations in the future (with predictions also made in the future) will self-modify to become an UDT agent, i.e., one that pays the counterfactual mugger.

Now, these kinds of heuristics are not applicable to the Incubator case. It is meaningless to ask "What maximal price should I have precommited to?" or "At what odds should I bet on coin flips of this kind in the future?", since the very point of the Gedankenexperiment is that the agent's existence is contingent upon the outcome of the coin flip. Can we come up with a different heuristic that leads to the correct answer? Imagine that the Incubator's subroutine that is responsible for creating the humans is completely benevolent towards them (let's call this the "Benevolent Creator"). (We assume here that the humans' goals are identical, such that the notion of benevolence towards all humans is completely unproblematic.) The Benevolent Creator has the power to program a certain maximal price the humans pay for the lottery tickets into them. A moment's thought shows that this leads indeed to UDT's answers for average and total utilitarians. For example, consider the case of total utilitarians. If the humans pay x$ for the lottery tickets, the expected utility is 1/2*(-x) + 1/2*2*(1-x). So indeed, the break-even price is reached for x=2/3.

But what about selfish agents? For them, the Benevolent Creator heuristic is no longer applicable. Since the humans' goals do not align, the Creator cannot share them. As Wei Dai writes, the notion of selfish values does not fit well with UDT. In Anthropic decision theory, Stuart Armstrong argues that selfish agents should pay up to 1/2$ (Sec. 3.3.3). His argument is based on an alleged isomorphism between the average utilitarian and the selfish case. (For instance, donating 1$ to each human increases utility by 1 for both average utilitarian and selfish agents, while it increases utility by 2 for total utilitarians in the tails world.) Here, I want to argue that this is incorrect and that selfish agents should pay up to 2/3$ for the lottery tickets.

(Needless to say that all the bold statements I'm about to make are based on an "inside view". An "outside view" tells me that Stuart Armstrong has thought much more carefully about these issues than I have, and has discussed them with a lot of smart people, which I haven't, so chances are my arguments are flawed somehow.)

In order to make my argument, I want to introduce yet another heuristic, which I call the Submissive Gnome. Suppose each cell contains a gnome which is already present before the coin is flipped. As soon as it sees a human in its cell, it instantly adopts the human's goal. From the gnome's perspective, SIA odds are clearly correct: Since a human is twice as likely to appear in the gnome's cell if the coin shows tails, Bayes' Theorem implies that the probability of tails is 2/3 from the gnome's perspective once it has seen a human. Therefore, the gnome would advise the selfish human to pay up to 2/3$ for a lottery ticket that pays 1$ in the tails world. I don't see any reason why the selfish agent shouldn't follow the gnome's advice. From the gnome's perspective, the problem is not even "anthropic" in any sense, there's just straightforward Bayesian updating.

Suppose we want to use the Submissive Gnome heuristic to solve the problem for utilitarian agents. (ETA:
Total/average utilitarianism includes the well-being and population of humans only, not of gnomes.) The gnome reasons as follows: "With probability 2/3, the coin has shown tails. For an average utilitarian, the expected utility after paying x$ for a ticket is 1/3*(-x)+2/3*(1-x), while for a total utilitarian the expected utility is 1/3*(-x)+2/3*2*(1-x). Average and total utilitarians should thus pay up to 2/3$ and 4/5$, respectively." The gnome's advice disagrees with UDT and the solution based on the Benevolent Creator. Something has gone terribly wrong here, but what? The mistake in the gnome's reasoning here is in fact perfectly isomorphic to the mistake in the reasoning leading to the "yea" answer in Psy-Kosh's non-anthropic problem.

Things become clear if we look at the problem from the gnome's perspective before the coin is flipped. Assume, for simplicity, that there are only two cells and gnomes, 1 and 2. If the coin shows heads, the single human is placed in cell 1 and cell 2 is left empty. Since the humans don't know in which cell they are, neither should the gnomes know. So from each gnome's perspective, there are four equiprobable "worlds": it can be in cell 1 or 2 and the coin flip can result in heads or tails. We assume, of course, that the two gnomes are, like the humans, sufficiently similar such that their decisions are "linked".

We can assume that the gnomes already know what utility functions the humans are going to have. If the humans will be (total/average) utilitarians, we can then even assume that the gnomes already are so, too, since the well-being of each human is as important as that of any other. Crucially, then, for both utilitarian utility functions, the question whether the gnome is in cell 1 or 2 is irrelevant. There is just one "gnome advice" that is given identically to all (one or two) humans. Whether this advice is given by one gnome or the other or both of them is irrelevant from both gnomes' perspective. The alignment of the humans' goals leads to alignment of the gnomes' goals. The expected utility of some advice can simply be calculated by taking probability 1/2 for both heads and tails, and introducing a factor of 2 in the total utilitarian case, leading to the answers 1/2 and 2/3, in accordance with UDT and the Benevolent Creator.

The situation looks different if the humans are selfish. We can no longer assume that the gnomes already have a utility function. The gnome cannot yet care about that human, since with probability 1/4 (if the gnome is in cell 2 and the coin shows heads) there will not be a human to care for. (By contrast, it is already possible to care about the average utility of all humans there will be, which is where the alleged isomorphism between the two cases breaks down.) It is still true that there is just one "gnome advice" that is given identically to all (one or two) humans, but the method for calculating the optimal advice now differs. In three of the four equiprobable "worlds" the gnome can live in, a human will appear in its cell after the coin flip. Two out of these three are tail worlds, so the gnome decides to advise paying up to 2/3$ for the lottery ticket if a human appears in its cell.

There is a way to restore the equivalence between the average utilitarian and the selfish case. If the humans will be selfish, we can say that the gnome cares about the average well-being of the three humans which will appear in its cell with equal likelihood: the human created after heads, the first human created after tails, and the second human created after tails. The gnome expects to adopt each of these three humans' selfish utility function with probability 1/4. It makes thus sense to say that the gnome cares about the average well-being of these three humans. This is the correct correspondence between selfish and average utilitarian values and it leads, again, to the conclusion that the correct advise is to pay up to 2/3$ for the lottery ticket.

In Anthropic Bias, Nick Bostrom argues that each human should assign probability 1/2 to the coin having shown tails ("SSA odds"). He also introduces the possible answer 2/3 ("SSA+SIA", nowadays usually simply called "SIA") and refutes it. SIA odds have been defended by Olum. The main argument against SIA is the Presumptuous Philosopher. Main arguments for SIA and against SSA odds are that SIA avoids the Doomsday Argument1, which most people feel has to be wrong, that SSA odds depend on whom you consider to be part of your "reference class", and furthermore, as pointed out by Bostrom himself, that SSA odds allow for acausal superpowers.

The consensus view on LW seems to be that much of the SSA vs. SIA debate is confused and due to discussing probabilities detached from decision problems of agents with specific utility functions. (ETA: At least this was the impression I got. Two commenters have expressed scepticism about whether this is really the consensus view.) I think that "What are the odds at which a selfish agent should bet on tails?" is the most sensible translation of "What is the probability that the coin has shown tails?" into a decision problem. Since I've argued that selfish agents should take bets following SIA odds, one can employ the Presumptuous Philosopher argument against my conclusion: it seems to imply that selfish agents, like total but unlike average utilitarians, should bet at extreme odds on living in a extremely large universe, even if there's no empirical evidence in favor of this. I don't think this counterargument is very strong. However, since this post is already quite lengthy, I'll elaborate more on this if I get encouraging feedback for this post.

1 At least its standard version. SIA comes with its own Doomsday conclusions, cf. Katja Grace's thesis Anthropic Reasoning in the Great Filter.


View more: Next