Comment author: Manfred 24 October 2014 07:03:09PM *  2 points [-]

still a big question what they argue

To be blunt, this is a question you can solve. Since it's a non-anthropic problem, though there is some danger in Beluga' analysis, vanilla UDT is all that's needed.

we still don't have evidence that the humans should follow them

The evidence goes as follows: The gnomes are in the same situation as the humans, with the same options and the same payoffs. Although they started with different information than the humans (especially since the humans didn't exist), at the time when they have to make the decision they have the same probabilities for payoffs given actions (although there's a deeper point here that could bear elaboration). Therefore the right decision for the gnome is also the right decision for the human.

This sounds an awful lot like an isomorphism argument to me... What sort of standard of evidence would you say is appropriate for an isomorphism argument?

Comment author: lackofcheese 25 October 2014 12:58:37AM 0 points [-]

The deeper point is important, and I think you're mistaken about the necessary and sufficient conditions for an isomorphism here.

If a human appears in a gnome's cell, then that excludes the counterfactual world in which the human did not appear in the gnome's cell. However, on UDT, the gnome's decision does depend on the payoffs in that counterfactual world.

Thus, for the isomorphism argument to hold, the preferences of the human and gnome must align over counterfactual worlds as well as factual ones. It is not sufficient to have the same probabilities for payoffs given linked actions when you have to make a decision, you also have to have the same probabilities for payoffs given linked actions when you don't have to make a decision.

Comment author: Karl 24 October 2014 09:12:20PM 3 points [-]

Firstly, the important part of my modification to the indifference formalism is not about conditioning on the actual o but it's the fact that in evaluating the expectation of UN it take the action in A2 (for a given pair (a1,o)) which maximize UN instead of the action which maximize U (note that U is equal to US in the case that o is not in Press.).

Secondly an agent which chose a1 by simply maximizing E[UN | NotPress; a1] + E[US | Press; a1] do exhibit pathological behaviors. In partcular, there will still be incentives to manage the news, but from both sides now (there is an incentive to cause the button to be pressed in the event of an information which is bad news from the point of view of UN and incentives to cause the button to not be pressed in the events of information which is bad news from the point of view of US.

Comment author: lackofcheese 24 October 2014 11:13:15PM *  2 points [-]

I think this means "indifference" isn't really the right term any more, because the agent is not actually indifferent between the two sets of observations, and doesn't really need to be.

So, how about U(a1, o, a2) =
UN(a1, o, a2) + max_b(US(a1, o, b)), if o is not in Press
US(a1, o, a2) + max_b(UN(a1, o, b)), if o is in Press

or, in your notation, U(a1, o, a2) = g(a1, o) + UN(a1, o, a2) if o is in Press, or US(a1, o, a2) + f(a1, o) if o is in Press.

Comment author: Karl 24 October 2014 09:12:20PM 3 points [-]

Firstly, the important part of my modification to the indifference formalism is not about conditioning on the actual o but it's the fact that in evaluating the expectation of UN it take the action in A2 (for a given pair (a1,o)) which maximize UN instead of the action which maximize U (note that U is equal to US in the case that o is not in Press.).

Secondly an agent which chose a1 by simply maximizing E[UN | NotPress; a1] + E[US | Press; a1] do exhibit pathological behaviors. In partcular, there will still be incentives to manage the news, but from both sides now (there is an incentive to cause the button to be pressed in the event of an information which is bad news from the point of view of UN and incentives to cause the button to not be pressed in the events of information which is bad news from the point of view of US.

Comment author: lackofcheese 24 October 2014 11:00:08PM *  1 point [-]

OK, you're right on that point; I misunderstood the "managing the news" problem because I hadn't quite realised that it was about shifting observations between the Press/NotPress sets. As you've said, the only resolution is to select a1 based on
E[max_b(UN(a1, O, b) | O; a1]
and not
E[max_b(UN(a1, O, b) | O not in Press; a1]

Comment author: lackofcheese 24 October 2014 06:10:38PM *  2 points [-]

OK, following on from Karl's comment below, which seems to resolve Section 4.2, I think it shouldn't be that difficult to resolve some or all of the Section 4.1 concerns as well. In order to satisfy Desideratum 4, we need the agent to be incentivised to build in safeguards to ensure that its possible future actions will be rated well according to US in cases where the shutdown button is pressed. The most obvious way to do this is to penalise situations where there is no good shutdown action according to US and reward situations where there is such an action, but this fails according to Desideratum 2, because we want the agent to remain indifferent as to whether the shutdown button is actually pressed.

However, there appears to be a simple solution; consider the "penalty term"
g(a1) = E[max_b(US(a1, O, b)) | O in Press ; a1],
which represents the expected value of US when the shutdown button is pressed. The first step in doing things properly is to add that term to observations not in Press, rather than subtracting it from observations in Press; this helps to ensure future corrigibility (and hence also corrigibility of subagents and successor agents).

Additionally, to resolve the Section 4.2 concerns we also want the agent to be indifferent to pressing its own shutdown button in cases where it gets unlucky, and the way we do that is, as per Karl's post, by calculating the bonus term conditional on the actual observation, or on the set of observations in Press, rather than conditional on the set of possible observations not in Press, i.e. Karl's f(a1, o) = max_b(UN(a1, o, b))

Finally, we will also need the additional terms
fp(a1) = E[max_b(UN(a1, O, b) | O in Press; a1] = E[f(a1, O) | O in Press; a1]
fnp(a1) = E[max_b(UN(a1, O, b)) | O not in Press; a1] = E[f(a1, O) | O not in Press; a1]
in order to make things work.

The utility function I propose is

U(a1, o, a2) = {
g(a1) + UN(a1, o, a2), for o not in Press.
US(a1, o, a2) + fnp(a1) + f(a1, o) - fp(a1), for o in Press.
}

Let's verify each of the Desiderata in turn.
(1) is satisfied because the value of a2 conditional on any o in Press depends only upon US, not UN.
(2) and (3) are not satisfied :( because of the "managing the news" problem, as Karl points out below. That being said, we've ensured indifference to the shutdown button because the expected value for any action conditional on Press is equivalent to its expected value conditional on NotPress, i.e. E[U | NotPress ; a1] = E[U | Press; a1], which holds because of the following three equations:
g(a1) = E[max_b(US(a1, O, b)) | O in Press; a1],
fnp(a1) =E[max_b(UN(a1, O, b)) | O not in Press; a1], and
fp(a1) = E[f(a1, O) | O in Press; a1].

(4) holds because the g(a1) term is added to the utility of every action, and will be greater for actions that promote future corrigibility.
(5) holds because apart from the g(a1) term, which is required to satisfy (4), the agent is simply maximising
E[UN | NotPress; a1] over all possible a1, which is what we wanted.

In actual fact, when choosing a1 our agent is simply maximising E[UN | NotPress; a1] + E[US | Press; a1].

EDIT: I've edited this to reflect the errors pointed out by Karl.

Comment author: Lumifer 24 October 2014 04:18:31PM 1 point [-]

time for further detail on the problem with pre-emptively submissive gnomes.

One of the aspects of what makes LW what it is -- people with serious expressions on their faces discuss the problems with pre-emptively submissive gnomes and nobody blinks an eye X-D

Comment author: lackofcheese 24 October 2014 04:27:41PM 0 points [-]

I guess your comment means that you must have blinked an eye, so your comment can't be completely true. That said, as discussions of pre-emptively submissive gnomes go, I would generally expect the amount of eye-blinking on LW to be well below average ^_~

Comment author: lackofcheese 24 October 2014 02:40:38AM *  3 points [-]

Having established the nature of the different utility functions, it's pretty simple to show how the gnomes relate to these. The first key point to make, though, is that there are actually two distinct types of submissive gnomes and it's important not to confuse the two. This is part of the reason for the confusion over Beluga's post.
Submissive gnome: I adopt the utility function of any human in my cell, but am completely indifferent otherwise.
Pre-emptively submissive gnome: I adopt the utility function of any human in my cell; if there is no human in my cell I adopt the utility function they would have had if they were here.

The two are different precisely in the key case that Stuart mentioned---the case where there is no human at all in the gnome's cell. Fortunately, the utility function of the human who will be in the gnome's cell (which we'll call "cell B") is entirely well-defined, because any existing human in the same cell will always end up with the same utility function. The "would have had" case for the pre-emptively submissive gnomes is a little stranger, but it still makes sense---the gnome's utility would correspond to the anti-indexical component JU of the human's utility function U (which, for selfish humans, is just zero). Thus we can actually remove all of the dangling references in the gnome's utility function, as per the discussion between Stuart and Beluga. If U is the utility function the human in cell B has (or would have), then the submissive gnome's utility function is IU (note the indexicalisation!) whereas the pre-emptively submissive gnome's utility function is simply U.

Following Beluga's post here, we can use these ideas to translate all of the various utility functions to make them completely objective and observer-independent, although some of them reference cell B specifically. If we refer to the second cell as "cell C", swapping between the two gnomes is equivalent to swapping B and C. For further simplification, we use $(B) to refer to the number of dollars in cell B, and o(B) as an indicator function for whether the cell has a human in it. The simplified utility functions are thus
T = $B + $C
A = ($B + $C) / (o(B) + o(C))
S = IS = $B
IT = o(B) ($B + $C)
IA = o(B) ($B + $C) / (o(B) + o(C))
Z = - $C
H = $B - $C
IH = o(B) ($B - $C)
Note that T and A are the only functions that are invariant under swapping B and C.

This invariance means that, for both cases involving utilitarian humans and pre-emptively submissive gnomes, all of the gnomes (including the one in an empty cell) and all of the humans have the same utility function over all possible worlds. Moreover, all of the decisions are obviously linked, and so there is effectively only one decision. Consequently, it's quite trivial to solve with UDT. Total utilitarianism gives
E[T] = 0.5(-x) + 2*0.5(1-x) = 1-1.5x
with breakeven at x = 2/3, and average utilitarianism gives
E[A] = 0.5(-x) + 0.5(1-x) = 0.5-x
with breakeven at x = 1/2.

In the selfish case, the gnome ends up with the same utility function whether it's pre-emptive or not, because IS = S. Also, there is no need to worry about decision linkage, and hence the decision problem is a trivial one. From the gnome's point of view, 1/4 of the time there will be no human in the cell, 1/2 of time there will be a human in the cell and the coin will have come up tails, and 1/4 of the time there will be a human in the cell and the coin will have come up heads. Thus
E[S] = 0.25(0) + 0.25(-x) + 0.5(1-x) = 0.5-0.75x
and the breakeven point is x = 2/3, as with the total utilitarian case.

In all of these cases so far, I think the humans quite clearly should follow the advice of the gnomes, because
1) Their utility functions coincide exactly over all a priori possible worlds.
2) The humans do not have any extra information that the gnomes do not.

Now, finally, let's go over the reasoning that leads to the so-called "incorrect" answers of 4/5 and 2/3 for total and average utilitarianism. We assume, as before, that the decisions are linked. As per Beluga's post, the argument goes like this:

With probability 2/3, the coin has shown tails. For an average utilitarian, the expected utility after paying x$ for a ticket is 1/3*(-x)+2/3*(1-x), while for a total utilitarian the expected utility is 1/3*(-x)+2/3*2*(1-x). Average and total utilitarians should thus pay up to 2/3$ and 4/5$, respectively.

So, what's the problem with this argument? In actual fact, for a submissive gnome, that advice is correct, but the human should not follow it. The problem is that a submissive gnome's utility function doesn't coincide with the utility function of the human over all possible worlds, because IT != T and IA != A. The key difference between the two cases is the gnome in the empty cell. If it's a submissive gnome, then it's completely indifferent to the plight of the humans; if it's a pre-emptively submissive gnome then it still cares.

If we were to do the full calculations for the submissive gnome, the gnome's utility function is IT for total utilitarian humans and IA for average utilitariam humans; since IIT = IT and IIA = IA the calculations are the same if the humans have indexical utility functions. For IT we get
E[IT] = 0.25(0) + 0.25(-x) + 2*0.5(1-x) = 1-1.25x
with breakeven at x = 4/5, and for IA we get
E[IA] = 0.25(0) + 0.25(-x) + 0.5(1-x) = 0.5-0.75x
with breakeven at x = 2/3. Thus the submissive gnome's 2/3 and 4/5 numbers are correct for the gnome, and indeed if the human's total/average utilitarianism is indexical they should just follow the advice, because their utility function would then be identical to the gnome's.

So, if this advice is correct for the submissive gnome, why should the pre-emptive submissive gnome's advice be different? After all, after conditioning on the presence of a human in the cell the two utility functions are the same. This particular issue is indeed exactly analogous to the mistaken "yea" answer in Psy-Kosh's non-anthropic problem. Although I side with UDT and/or the precommitment-based reasoning, I think that question warrants further discussion, so I'll leave that for a third comment.

Comment author: lackofcheese 24 October 2014 03:52:51PM *  2 points [-]

OK, time for further detail on the problem with pre-emptively submissive gnomes. Let's focus on the case of total utilitarianism, and begin by looking at the decision in unlinked form, i.e. we assume that the gnome's advice affects only one human if there is one in the room, and zero humans otherwise. Conditional on there being a human in cell B, the expected utility of the human in cell B buying a ticket for $x is, indeed, (1/3)(-x) + (2/3)(1-x) = 2/3 - x, so the breakeven is obviously at x = 2/3. However, if we also assume that the gnome in the other cell will give the same advice, we get (1/3)(-x) + 2(2/3)(1-x) = 4/3 - (5/3)x, with breakeven at x=4/5. In actual fact, the gnome's reasoning, and the 4/5 answer, is correct. If tickets were being offered at a price of, say, 75 cents, then the overall outcome (conditional on there being a human in cell B) is indeed better if the humans buy at 75 cents than if they refuse to buy at 75 cents, because 3/4 is less than 4/5.

As I mentioned previously, in the case where the gnome only cares about total $ if there is a human in its cell, then 4/5 is correct before conditioning on the presence of a human, and it's also correct after conditioning on the presence of a human; the number is 4/5 regardless. However, the situation we're examining here is different, because the gnome cares about total $ even if no human is present. Thus we have a dilemma, because it appears that UDT is correct in advising the gnome to precommit to 2/3, but the above argument also suggests that after seeing a human in its cell it is correct for the gnome to advise 4/5.

The key distinction, analogously to mwenger's answer to Psy-Kosh's non-anthropic problem, has to do with the possibility of a gnome in an empty cell. For a total utilitarian gnome in an empty cell, any money at all spent in the other cell translates directly into negative utility. That gnome would prefer the human in the other cell to spend $0 at most, but of course there is no way to make this happen, since the other gnome has no way of knowing that this is the case.

The resolution to this problem is that, for linked decisions, you must (as UDT does) necessarily consider the effects of that decision over all a priori possible worlds affected by that decision. As it happens, this is the same thing as what you would do if you had the opportunity to precommit in advance.

It's a bit trickier to justify why this should be the case, but the best argument I can come up with is to apply that same "linked decision" reasoning at one meta-level up, the level of "linked decision theories". In short, by adopting a decision theory that ignores linked decisions in a priori possible worlds that are excluded by your observations, you are licensing yourself and other agents to do the same thing in future decisions, which you don't want. If other agents follow this reasoning, they will give the "yea" answer in Psy-Kosh's non-anthropic problem, but you don't want them to do that.

Note that most of the time, decisions in worlds excluded by your observations do not usually tend to be "linked". This is because exclusion by observation would usually imply that you receive a different observation in the other possible world, thus allowing you to condition your decision on that observation, and thereby unlinking the decisions. However, some rare problems like the Counterfactual Mugging and Psy-Kosh's non-anthropic problem violate this tendency, and should therefore be treated differently.

Overall, then, the "linked decision theory" argument supports adopting UDT, and it means that you should consider all linked decisions in all a priori possible worlds.

Comment author: Stuart_Armstrong 24 October 2014 11:55:28AM 2 points [-]

Let's ditch the gnomes, they are contributing little to this argument.

My average ut=selfish argument was based on the fact that if you changed the utility of everyone who existed from one system to the other, then people's utilities would be the same, given that they existed.

The argument here is that if you changed the utility of everyone from one system to the other, then this would affect their counterfactual utility in the worlds where they don't exist.

That seems... interesting. I'll reflect further.

Comment author: lackofcheese 24 October 2014 02:38:41PM 2 points [-]

Yep, I think that's a good summary. UDT-like reasoning depends on the utility values of counterfactual worlds, not just real ones.

Comment author: Stuart_Armstrong 24 October 2014 10:34:09AM *  1 point [-]

I like your analysis. Interestingly, the gnomes advise in the T and A cases for completely different reasons than in the S case.

But let me modify the case slightly: now the gnomes adopt the utility function of the closest human. This makes no difference to the T and A cases. But now in the S case, the gnomes have a linked decision, and

E[S] = 0.25(-x) + 0.25(-x) + 0.5(1-x) = 0.5-x

This also seems to satisfy "1) Their utility functions coincide exactly over all a priori possible worlds. 2) The humans do not have any extra information that the gnomes do not." Also, the gnomes are now deciding the T, A and S cases for the same reasons (linked decisions).

Comment author: lackofcheese 24 October 2014 11:16:01AM *  2 points [-]

I don't think that works, because 1) isn't actually satisfied. The selfish human in cell B is indifferent over worlds where that same human doesn't exist, but the gnome is not indifferent.

Consequently, I think that as one of the humans in your "closest human" case you shouldn't follow the gnome's advice, because the gnome's recommendation is being influenced by a priori possible worlds that you don't care about at all. This is the same reason a human with utility function T shouldn't follow the gnome recommendation of 4/5 from a gnome with utility function IT. Even though these recommendations are correct for the gnomes, they aren't correct for the humans.

As for the "same reasons" comment, I think that doesn't hold up either. The decisions in all of the cases are linked decisions, even in the simple case of U = S above. The difference in the S case is simply that the linked nature of the decision turns out to be irrelevant, because the other gnome's decision has no effect on the first gnome's utility. I would argue that the gnomes in all of the cases we've put forth have always had the "same reasons" in the sense that they've always been using the same decision algorithm, albeit with different utility functions.

Comment author: lackofcheese 24 October 2014 12:19:52AM *  3 points [-]

I think I can resolve the confusion here, but as a quick summary, I'm quite sure Beluga's argument holds up. The first step is to give a clear statement of what the difference is between the indexical and non-indexical versions of the utility functions. This is important because the UDT approach translates to "What is the optimal setting for decision variable X, in order to maximise the expected utility over all a priori possible worlds that are influenced by decision variable X?" On the basis of UDT or UDT-like principles such as an assumption of linked decisions, it thus follows that two utility functions are equivalent for this purpose if and only if they are equivalent over all possible worlds in which the outcomes are dependent upon X.

Now, as the first step in resolving these issues I think it's best to go over all of the relevant utility functions for this problem. First, let's begin with the three core non-indexical cases (or "lexicality-independent" cases, although I'm not sure of the term):
Indifference (0): I don't care at all about anything (i.e. a constant function).
Total utilitarian (T): I care linearly in the sum total dollars owned by humans in all possible worlds.
Average utilitarian (A): I care linearly in the average dollars owned by humans in all possible worlds.
There's also one essential operator we can apply to these functions:
Negation (-): -F = my preferences are the exact inverse of F.
e.g. -T would mean that you want humans to lose as many total dollars as possible.

Now for indexical considerations, the basic utility function is
Selfish (S): I care linearly in the amount of dollars that I own.
Notably, as applied to worlds where you don't exist, selfishness is equivalent to indifference. With this in mind, it's useful to introduce two indexical operators; first there's
Indexicalization (I): IF(w) = F(w) if you exist in world w, and 0 if you do not exist in world w.
Of course, it's pretty clear that IS=I, since S was already indifferent to worlds where you don't exist. Similarly, we can also introduce
Anti-indexicalization (J): JF(w) = 0 if you exist in world w, and F(w) if you do not exist in world w.

It's important to note that if you can influence the probability of yourself existing the constant value of the constant function becomes important, so these indexical operators are actually ill-conditioned in the general case. In this case, though, you don't affect the probability of your own existence, and so we may as well pick the constant to be zero. Also, since our utility functions are all enumerated in dollars we can also reasonably talk about making linear combinations of them, and so we can add, subtract, and multiply by constants. In general this wouldn't make sense but it's a useful trick here. With this in mind, we also have the identity IF + JF = F.

Now we already have all we need to define the other utility functions discussed here. Indexical total utilitarianism is simply IT, which translates into English as "I care about the total dollars owned by humans, but only if I exist; otherwise I'm indifferent."

As for "hatred", it's important to note that there are several different kinds. First of all, there is "anti-selflessness", which I represent via Z = S - T; this translates to "I don't care about myself, but I want people who aren't me to lose as many dollars as possible, whether or not I exist". Then there's the kind of hatred proposed below, where you still care about your own money as well; that one still comes in two different kinds. There is plain "selfish hatred" H = 2S - T, and then there's its indexical version IH = I(2S - T) = 2S - IT, which translates to "In worlds in which I exist, I want to get as much money as possible and for other people to have as little money as possible". The latter is probably best referred to as "jealousy" rather than hatred. From these definitions, two identities of selfishness as mixes of total utilitarianism and hatred follow pretty clearly, as S = 0.5(H+T) = 0.5(IH+IT).

Next comment: submissive gnomes, and the correct answers.

EDIT: Apparently the definitions of "hater" used in the other comments assume that haters still care about their own money, so I've updated my definitions.

Comment author: lackofcheese 24 October 2014 02:40:38AM *  3 points [-]

Having established the nature of the different utility functions, it's pretty simple to show how the gnomes relate to these. The first key point to make, though, is that there are actually two distinct types of submissive gnomes and it's important not to confuse the two. This is part of the reason for the confusion over Beluga's post.
Submissive gnome: I adopt the utility function of any human in my cell, but am completely indifferent otherwise.
Pre-emptively submissive gnome: I adopt the utility function of any human in my cell; if there is no human in my cell I adopt the utility function they would have had if they were here.

The two are different precisely in the key case that Stuart mentioned---the case where there is no human at all in the gnome's cell. Fortunately, the utility function of the human who will be in the gnome's cell (which we'll call "cell B") is entirely well-defined, because any existing human in the same cell will always end up with the same utility function. The "would have had" case for the pre-emptively submissive gnomes is a little stranger, but it still makes sense---the gnome's utility would correspond to the anti-indexical component JU of the human's utility function U (which, for selfish humans, is just zero). Thus we can actually remove all of the dangling references in the gnome's utility function, as per the discussion between Stuart and Beluga. If U is the utility function the human in cell B has (or would have), then the submissive gnome's utility function is IU (note the indexicalisation!) whereas the pre-emptively submissive gnome's utility function is simply U.

Following Beluga's post here, we can use these ideas to translate all of the various utility functions to make them completely objective and observer-independent, although some of them reference cell B specifically. If we refer to the second cell as "cell C", swapping between the two gnomes is equivalent to swapping B and C. For further simplification, we use $(B) to refer to the number of dollars in cell B, and o(B) as an indicator function for whether the cell has a human in it. The simplified utility functions are thus
T = $B + $C
A = ($B + $C) / (o(B) + o(C))
S = IS = $B
IT = o(B) ($B + $C)
IA = o(B) ($B + $C) / (o(B) + o(C))
Z = - $C
H = $B - $C
IH = o(B) ($B - $C)
Note that T and A are the only functions that are invariant under swapping B and C.

This invariance means that, for both cases involving utilitarian humans and pre-emptively submissive gnomes, all of the gnomes (including the one in an empty cell) and all of the humans have the same utility function over all possible worlds. Moreover, all of the decisions are obviously linked, and so there is effectively only one decision. Consequently, it's quite trivial to solve with UDT. Total utilitarianism gives
E[T] = 0.5(-x) + 2*0.5(1-x) = 1-1.5x
with breakeven at x = 2/3, and average utilitarianism gives
E[A] = 0.5(-x) + 0.5(1-x) = 0.5-x
with breakeven at x = 1/2.

In the selfish case, the gnome ends up with the same utility function whether it's pre-emptive or not, because IS = S. Also, there is no need to worry about decision linkage, and hence the decision problem is a trivial one. From the gnome's point of view, 1/4 of the time there will be no human in the cell, 1/2 of time there will be a human in the cell and the coin will have come up tails, and 1/4 of the time there will be a human in the cell and the coin will have come up heads. Thus
E[S] = 0.25(0) + 0.25(-x) + 0.5(1-x) = 0.5-0.75x
and the breakeven point is x = 2/3, as with the total utilitarian case.

In all of these cases so far, I think the humans quite clearly should follow the advice of the gnomes, because
1) Their utility functions coincide exactly over all a priori possible worlds.
2) The humans do not have any extra information that the gnomes do not.

Now, finally, let's go over the reasoning that leads to the so-called "incorrect" answers of 4/5 and 2/3 for total and average utilitarianism. We assume, as before, that the decisions are linked. As per Beluga's post, the argument goes like this:

With probability 2/3, the coin has shown tails. For an average utilitarian, the expected utility after paying x$ for a ticket is 1/3*(-x)+2/3*(1-x), while for a total utilitarian the expected utility is 1/3*(-x)+2/3*2*(1-x). Average and total utilitarians should thus pay up to 2/3$ and 4/5$, respectively.

So, what's the problem with this argument? In actual fact, for a submissive gnome, that advice is correct, but the human should not follow it. The problem is that a submissive gnome's utility function doesn't coincide with the utility function of the human over all possible worlds, because IT != T and IA != A. The key difference between the two cases is the gnome in the empty cell. If it's a submissive gnome, then it's completely indifferent to the plight of the humans; if it's a pre-emptively submissive gnome then it still cares.

If we were to do the full calculations for the submissive gnome, the gnome's utility function is IT for total utilitarian humans and IA for average utilitariam humans; since IIT = IT and IIA = IA the calculations are the same if the humans have indexical utility functions. For IT we get
E[IT] = 0.25(0) + 0.25(-x) + 2*0.5(1-x) = 1-1.25x
with breakeven at x = 4/5, and for IA we get
E[IA] = 0.25(0) + 0.25(-x) + 0.5(1-x) = 0.5-0.75x
with breakeven at x = 2/3. Thus the submissive gnome's 2/3 and 4/5 numbers are correct for the gnome, and indeed if the human's total/average utilitarianism is indexical they should just follow the advice, because their utility function would then be identical to the gnome's.

So, if this advice is correct for the submissive gnome, why should the pre-emptive submissive gnome's advice be different? After all, after conditioning on the presence of a human in the cell the two utility functions are the same. This particular issue is indeed exactly analogous to the mistaken "yea" answer in Psy-Kosh's non-anthropic problem. Although I side with UDT and/or the precommitment-based reasoning, I think that question warrants further discussion, so I'll leave that for a third comment.

Comment author: lackofcheese 24 October 2014 12:19:52AM *  3 points [-]

I think I can resolve the confusion here, but as a quick summary, I'm quite sure Beluga's argument holds up. The first step is to give a clear statement of what the difference is between the indexical and non-indexical versions of the utility functions. This is important because the UDT approach translates to "What is the optimal setting for decision variable X, in order to maximise the expected utility over all a priori possible worlds that are influenced by decision variable X?" On the basis of UDT or UDT-like principles such as an assumption of linked decisions, it thus follows that two utility functions are equivalent for this purpose if and only if they are equivalent over all possible worlds in which the outcomes are dependent upon X.

Now, as the first step in resolving these issues I think it's best to go over all of the relevant utility functions for this problem. First, let's begin with the three core non-indexical cases (or "lexicality-independent" cases, although I'm not sure of the term):
Indifference (0): I don't care at all about anything (i.e. a constant function).
Total utilitarian (T): I care linearly in the sum total dollars owned by humans in all possible worlds.
Average utilitarian (A): I care linearly in the average dollars owned by humans in all possible worlds.
There's also one essential operator we can apply to these functions:
Negation (-): -F = my preferences are the exact inverse of F.
e.g. -T would mean that you want humans to lose as many total dollars as possible.

Now for indexical considerations, the basic utility function is
Selfish (S): I care linearly in the amount of dollars that I own.
Notably, as applied to worlds where you don't exist, selfishness is equivalent to indifference. With this in mind, it's useful to introduce two indexical operators; first there's
Indexicalization (I): IF(w) = F(w) if you exist in world w, and 0 if you do not exist in world w.
Of course, it's pretty clear that IS=I, since S was already indifferent to worlds where you don't exist. Similarly, we can also introduce
Anti-indexicalization (J): JF(w) = 0 if you exist in world w, and F(w) if you do not exist in world w.

It's important to note that if you can influence the probability of yourself existing the constant value of the constant function becomes important, so these indexical operators are actually ill-conditioned in the general case. In this case, though, you don't affect the probability of your own existence, and so we may as well pick the constant to be zero. Also, since our utility functions are all enumerated in dollars we can also reasonably talk about making linear combinations of them, and so we can add, subtract, and multiply by constants. In general this wouldn't make sense but it's a useful trick here. With this in mind, we also have the identity IF + JF = F.

Now we already have all we need to define the other utility functions discussed here. Indexical total utilitarianism is simply IT, which translates into English as "I care about the total dollars owned by humans, but only if I exist; otherwise I'm indifferent."

As for "hatred", it's important to note that there are several different kinds. First of all, there is "anti-selflessness", which I represent via Z = S - T; this translates to "I don't care about myself, but I want people who aren't me to lose as many dollars as possible, whether or not I exist". Then there's the kind of hatred proposed below, where you still care about your own money as well; that one still comes in two different kinds. There is plain "selfish hatred" H = 2S - T, and then there's its indexical version IH = I(2S - T) = 2S - IT, which translates to "In worlds in which I exist, I want to get as much money as possible and for other people to have as little money as possible". The latter is probably best referred to as "jealousy" rather than hatred. From these definitions, two identities of selfishness as mixes of total utilitarianism and hatred follow pretty clearly, as S = 0.5(H+T) = 0.5(IH+IT).

Next comment: submissive gnomes, and the correct answers.

EDIT: Apparently the definitions of "hater" used in the other comments assume that haters still care about their own money, so I've updated my definitions.

View more: Prev | Next