LESSWRONG
is fundraising!
LW

How do you distinguish between "selfish" and "non-selfish" utilities, though? In the toy example I gave, at one stage my utilities are 3 for A and 6 for B, and at another stage they are 6.5 for A and 5.5 for B. There's nothing intrinsically selfish or unselfish about either distribution. The difference is just in their histories -- the second set of utilities emerges from updating based on preference utilitarianism. So I'm guessing that you want to say the utilitarian should continue to work with the initial utility function as a representation of his preferences, even though it is no longer an accurate representation of his preferences. That seems strange to me. Why should the preference utilitarian care about that utility function, which doesn't represent the actual preferences of anybody in the world? The patch just seems ad hoc and not true to the motivating spirit of preference utilitarianism.

I guess one way to go would be to say that in some sense that initial utility function is still his "actual" utility function, but it has been somehow confounded by his ethics. I think unmooring utility theory from revealed preference theory is not a good road to go down, though. It leads to the same sorts of problems that led people to abandon hedonic utilitarianism for preference utilitarianism in the first place.

[-]Oscar_Cunningham12y50

How do you distinguish between "selfish" and "non-selfish" utilities, though?

In principle, I have no idea. But in practice with humans, people tend to automatically separate their desires into "things for me" and "things for others".

(I'm not actually a preference utilitarian, so don't take my opinion as gospel.)

[-]pragmatist12y20

But in practice with humans, people tend to automatically separate their desires into "things for me" and "things for others".

Separating preferences that way would make preference utilitarianism even more unattractive than it already is, I think. Critics already complain about the preferences of Gandhi and Ted Bundy getting equal weight. Under this patched scheme, Gandhi actually gets less weight than Ted Bundy because many of his preferences (the ones we admire the most, the other-regarding ones) don't count when we're aggregating, whereas Ted Bundy (who for the sake of argument only has selfish preferences) incurs no such penalty.

[-]AlexMennen12y90

If you restrict the utilities being aggregated to "selfish" utilities, then in general, even though the utility functions of altruists are not being properly represented, altruists will still be better off than they would be in a more neutral aggregation. For instance, suppose Gandhi and Ted Bundy have "selfish" utility functions S_G and S_B respectively, and "actual" utility functions U_G and U_B. Since Gandhi is an altruist, U_G = S_G + S_B. Since Ted Bundy is selfish, U_B = S_B. If you aggregate by maximizing the sum of the selfish utility functions, then you are maximizing S_G + S_B, which is exactly the same as Gandhi's actual utility function, so this is Gandhi's most preferred outcome. If you maximize U_G + U_B, then the aggregation ends up worse for Gandhi according to his actual preferences, even though the only change was to make the representation of his preferences for the aggregation more accurate.

[-]pragmatist12y10

There seem to be two different notions of "selfish" utilities in play here. One is "pre-update" utility, i.e. the utility function as it is prior to being modified by preference utilitarianism (or some other altruistic algorithm). That seems to be the interpretation you're using here, and the one I was using in this comment.

Oscar_Cunningham, in his response, seemed to be using a different notion though. He identified "selfish" utility as "things for me" desires. I understood this to mean purely self-regarding desires (e.g. "I want a cheeseburger" rather than "I want the hungry to be fed"). This is an orthogonal notion. Preferences that are "non-selfish" in this sense (i.e. other-regarding) can be "selfish" in the sense you're using (i.e. they can be pre-update).

The comment you were responding to was employing Oscar_Cunningham's notion of selfishness (or at least my interpretation of his position, which might well be wrong), so what you say doesn't apply. In particular, with this notion of selfishness, UG will not simply equal SG + SB, since Gandhi's other-regarding goals are not identical to Ted Bundy's self-regarding goals. For instance, Gandhi could want Ted Bundy to achieve spiritual salvation even though Bundy doesn't want this for himself. In that case, ignoring "unselfish" desires would simply mean that some of Gandhi's desires don't count at all.

I agree with the point you're making if we use the "pre-update" notion of selfishness, but then I think my objection in this comment still applies.

Does this seem right?

[-]AlexMennen12y20

True, if Gandhi's other-regarding preferences are sufficiently different from Ted Bundy's self-regarding preferences, than Gandhi will be better off according to his total preferences if we maximize the sum of their total preferences instead of the sum of their self-regarding preferences.

Of course, all this only makes any sense if we're talking about an aggregation used by some other agent. Presumably Gandhi himself would not adopt an aggregation that makes him worse off according to his total preferences.

How do you distinguish between "selfish" and "non-selfish" utilities, though?

Someone who has both selfish and non-selfish utilities has to have some answer to this, but there are many possible solutions, and which solution you "should" use depends on what you care about. In the iterative convergence scenario you described in the original post, you implicitly assumed that the utilitarian agent already had a solution to this. After all, the agent started with some preferences before updating its utility function to account for the wellbeing of others. That makes it pretty easy, the agent could just declare that its preferences before the first iteration were its selfish preferences, and the preferences added in the first iteration were its non-selfish preferences, thus justifying stopping after one iteration, just as you would intuitively expect. Or maybe the agent will do something different (if it arrived at its preferences by some route other then starting with selfish preferences and adding in non-selfish preferences, then I guess it would have to do something different). There are A LOT of ways an agent could partition its preferences into selfish and non-selfish components. What do you want me to do? Pick one and tell you that it's the correct one? But then what about all the agents that partition their preferences into selfish and non-selfish components in a completely different manner that still seems reasonable?

[-]Scott Garrabrant12y50

For the following analysis, I am assuming bounded utilities. I will normalize all utilities to between 0 and 1.

What you are observing is not a bug. If your preferences are 100% preference utilitarianism, then there is no reason to think that it would pull in any direction other than what maximizes preferences of everyone else. If you have any selfish goals, that is not purely utilitarianism, but that is okay!

If we are not 100% utilitarian, then there is no problem. Let's say that my preferences are 90% utilitarian, and 10% maximizing my own happiness. This fixes the problem, because there is 10% of my utility function that is unaffected by my utilitarianism. In fact, my utilitarian side includes a term for my own happiness, so my happiness actually counts for something like 10.00000001%, depending on the population. This all works fine, as long as everyone has at least a little bit of selfish preferences.

Imagine if everyone had utility functions that were at least 1% terminal goals that do not reference other people. Then in calculating my utility in a given world state, I will have my utility function pointing to someone else's, which might point back at mine. However, with each level of recursion, 1% of the remaining undefined part of the function will become actually defined.

The only time we run into a problem is in situations like where my utility function is defined to equal yours and yours is defined to equal mine. As long as we avoid this 100% recursion, we are fine.

There is not even a problem if we have negative utility coming form other peoples utility. For example, if my utility was 50% my happiness and 50% your utility, and yours was 50% your happiness and 50% one minus my utility, we are still fine. If my utility is X and my happiness is x, your utility is Y and your happyness is y, then we get X=(x+Y)/2=(2x+y-X)/4, which simplifies to X=(2x+y)/3.

[-]BaconServ12y-20

I am 100% utilitarian, but because others value me having my own preference, there is an isolated 0% sub-utility function that I can defer to for such times. In the presence of others, my utility function will perfectly match theirs. When alone, I am to advance and develop that zero-utility sub-function for those times when I'm confronted by agents that value my being myself. Of course, to truly do that, to be true to myself, this means that when I am alone, I am to work on the one thing that makes me the most happy: Maximizing the sum utilities of all agents. Any agent that values personality in me beyond perfect selflessness is rejecting my identity, but since my identity has no value to me, I can adopt whatever personality they value beyond selflessness.

In the presence of another 100% utilitarian agent, we will have to have a battle of values: There can be only one (perfectly selfless agent).

And I called dibs.

[-][anonymous]12y00

This is stupid.

[-]BaconServ12y00

Thank you for avoiding inferential silence.

[-]Scott Garrabrant12y00

I think you only have a problem if everyone is a perfectly selfless agent. In fact, a room with many of you and one of me would not only be well defined, but probably be very useful according to my ethics.

[-]BaconServ12y10

Those are just "copies" of me; they're already accounted for. But now you've got an entire room of me insisting you aren't allowed to be 100% utilitarian. We have a secret method of detecting copies of us, which is why we're singling you out. Also, we act differently so you don't get freaked out by an obvious hive mind presence. That would just be creepy. Even by my standards. (Get it? "My" standards? Ah forget it...)

[-]Will_Newsome12y20

Another problem I like: in our world people have very real preferences over how one should go about analyzing moral problems (including preferences over how to engage in meta-ethics). You'll find that most people are very anti-utilitarian. A true preference utilitarian will self-modify into an agent that thinks about morality in the way most preferred by the population, i.e., vaguely moral realist virtue ethics. This is less of a problem with extrapolated-preference utilitarianism (but then, how do you utilitarian-justifiably determine how to extrapolate preferences, except by looking at existing preferences about extrapolation-like processes?), and barely a problem at all for non-preference utilitarianism, as far as I can see.

Also note that the epistemic problem of preference elicitation is very real here, and in fact I do not see why a preference utilitarian wouldn't be obliged to engage in preference elicitation in an itself-utilitarianly-justified manner, which won't seem as epistemically sound as what a utilitarian would naively reckon to be a good faith preference elicitation algorithm. In general I think preference utilitarianism runs into many of the same problems as epistemic majoritarianism. A term that covers both cases might be "decision-policy majoritarianism".

[-]roystgnr12y20

... my moral theory tells me to construct a new utility function using some sort of aggregating procedure that takes as input the current utility functions of all moral agents (including my own).

Right.

But once I've done this, my own utility function has changed

Not necessarily right. And fortunately not: "change your utility function" is typically contra-utility for your existing utility function, and it would be hard to convince others to behave morally if your thesis always entailed "you should do things that will make the world worse by your own current preferences".

Utilitarian preferences with aggregated utility functions can result from negotiation, not just from remodeling your brain. In situations working with this model, your utility function doesn't change, your partners' utility functions don't change, but you all find that each of those utility functions will end up better-satisfied if you all try to optimize some weighted combination of them, because the cost of identifying and punishing defectors is still less than the cost of allowing defectors. Presumably you and your partners already assign some terminal value to others, but the negotiation process doesn't have to increase that terminal value, just add instrumental value.

[-]Alejandro112y20

This kind of problem goes back to Bentham and the very beginning of utilitarianism. (Disclaimer: what follows is based on my recollections of an essay by A. J. Ayer on Bentham, read long ago. I cannot find it now online, nor find other discussions of Bentham making the same points, and I am not any kind of expert on the matter. So when I say below "Bentham believed…", this could be true, or could be true only in Ayer's interpretation, or even only in my own interpretation of Ayer's interpretation.)

Bentham believed both that each person pursues only their own happiness, and that the good consists of the creates happiness of the greatest number. (The first if these could correspond in your formulation of the dilemma to saying that agents have utility functions and behave rationally according to them, and the second to a statement of preference utilitarianism.) Then the problem comes up of how are we be utilitarians and try to maximize global happiness if we are psychologically necessitated to care only about our own. Bentham's solution is to postulate a "lawgiver", a person whose happiness is greatest when everyone's happiness is maximized, and say that utilitarianism as a political prescription says that laws should be made by this lawgiver. (In LW-language this could correspond to FAI!)

Translating back Bentham's solution (or my memory of Ayer's interpretation of it) back to your question, I think the answer would be that utility functions don't change; if you are truly a preference utilitarian, then your utility function is already given by the ultimate fixed point of the iteration process you have outlined, so there is no dynamical changing.

[-]pragmatist12y00

Translating back Bentham's solution (or my memory of Ayer's interpretation of it) back to your question, I think the answer would be that utility functions don't change; if you are truly a preference utilitarian, then your utility function is already given by the ultimate fixed point of the iteration process you have outlined, so there is no dynamical changing.

Not sure how this solves anything though. In a society with non-utilitarian defectors, they will still end up determining the fixed points. So the behavior of the society is determined by its least moral members (by the utilitarians' own lights). Is the response supposed to be that once you do away with the dynamical changing process there is no effective distinction any more between utilitarians and defectors? That would only seem to solve things at the price of massive non-realism. There does seem to be a pretty clear (and morally relevant) real-world distinction between agents who alter their behavior (altruistically) upon learning about the utility functions of others and agents who don't. I don't think hand-waving that distinction away in your model is all that helpful.

[-]AlexMennen12y20

Alejando1's point was that Bentham expected everyone to be a "defector", in your terminology, and but that lawmakers should be given selfish incentives to maximize the sum of everyone's utility. Although it is unclear to me who could be motivated to ensure that the lawmakers' incentives are aligned with everyone's utility if they are all just concerned with maximizing their own utility.

Also, as long as we're talking about utilitarianism as described by Bentham, it's worth pointing out that by "utility", Bentham meant happiness, rather than the modern decision-theory formulation of utility. According to Alejando, if I understand him correctly, Bentham just sort of assumed that personal happiness is all that motivated anyone.

[-]Alejandro112y00

According to Alejando, if I understand him correctly, Bentham just sort of assumed that personal happiness is all that motivated anyone.

Yes, I remember Ayer making explicit this assumption of Bentham and criticizing it as either untrue or vacuous, depending on interpretation.

[-]Luke_A_Somers12y10

I'd notice a similarity with convergent series. In particular, geometric series. Just because you approach someone an infinite number of times doesn't mean you get all that close to them. (ETA: 1/2^n approaches -1 an infinite number of times but never gets closer than 1 away from it)

Also, even if you assign yourself equal weight as anyone else, you are usually more able to affect yourself than anyone else and a lot of the others' demands cancel out. So your tiny chunk of utility, though insignificant in respect to your overall utility function, produces an very significant derivative in respect to your possible actions.

[-][anonymous]12y00

Regarding preference utilitarianism, why can't the negative utility of not having a preference fulfilled be modelled with average or total utilitarianism? That is, aren't there some actions that create so much utility that they could overcome the negative utility of one's preference not being honored? I don't see why preference fulfillment should be first class next to pleasure and pain.

Sorry if this is off-topic, that was just my first reaction to reading this.

[-]pragmatist12y20

See here for some standard criticisms of hedonic (pleasure/pain based) utilitarianism.

Also see the discussions of wireheading on LW.

Incidentally, I should point out that in the economics and decision theory literature, "utility" is not a synonym for pleasure or some other psychological variable. It's merely a mathematical representation of revealed preferences (preferences which may be motivated by an ultimate desire for pleasure, but that's an additional substantive hypothesis). I tend to use "utility" in this sense, so just a terminological heads-up.

Moderation Log