Harsanyi's Social Aggregation Theorem and what it means for CEV

AlexMennen

45 Harsanyi's Social Aggregation Theorem and what it means for CEV

5th Jan 2013

4 min read

45 Ω 11

A Friendly AI would have to be able to aggregate each person's preferences into one utility function. The most straightforward and obvious way to do this is to agree on some way to normalize each individual's utility function, and then add them up. But many people don't like this, usually for reasons involving utility monsters. If you are one of these people, then you better learn to like it, because according to Harsanyi's Social Aggregation Theorem, any alternative can result in the supposedly Friendly AI making a choice that is bad for every member of the population. More formally,

Axiom 1: Every person, and the FAI, are VNM-rational agents.

Axiom 2: Given any two choices A and B such that every person prefers A over B, then the FAI prefers A over B.

Axiom 3: There exist two choices A and B such that every person prefers A over B.

(Edit: Note that I'm assuming a fixed population with fixed preferences. This still seems reasonable, because we wouldn't want the FAI to be dynamically inconsistent, so it would have to draw its values from a fixed population, such as the people alive now. Alternatively, even if you want the FAI to aggregate the preferences of a changing population, the theorem still applies, but this comes with it's own problems, such as giving people (possibly including the FAI) incentives to create, destroy, and modify other people to make the aggregated utility function more favorable to them.)

Give each person a unique integer label from $1$ to $n$ , where $n$ is the number of people. For each person $k$ , let $u_{k}$ be some function that, interpreted as a utility function, accurately describes $k$ 's preferences (there exists such a function by the VNM utility theorem). Note that I want $u_{k}$ to be some particular function, distinct from, for instance, $2u_{k}-7$ , even though $u_{k}$ and $2u_{k}-7$ represent the same utility function. This is so it makes sense to add them.

Theorem: The FAI maximizes the expected value of $\sum_{k=1}^{n}c_{k}u_{k}$ , for some set of scalars $\left\{ c_{k}\right\} _{k=1}^{n}$ .

Actually, I changed the axioms a little bit. Harsanyi originally used “Given any two choices A and B such that every person is indifferent between A and B, the FAI is indifferent between A and B” in place of my axioms 2 and 3 (also he didn't call it an FAI, of course). For the proof (from Harsanyi's axioms), see section III of Harsanyi (1955), or section 2 of Hammond (1992). Hammond claims that his proof is simpler, but he uses jargon that scared me, and I found Harsanyi's proof to be fairly straightforward.

Harsanyi's axioms seem fairly reasonable to me, but I can imagine someone objecting, “But if no one else cares, what's wrong with the FAI having a preference anyway. It's not like that would harm us.” I will concede that there is no harm in allowing the FAI to have a weak preference one way or another, but if the FAI has a strong preference, that being the only thing that is reflected in the utility function, and if axiom 3 is true, then axiom 2 is violated.

proof that my axioms imply Harsanyi's: Let A and B be any two choices such that every person is indifferent between A and B. By axiom 3, there exists choices C and D such that every person prefers C over D. Now consider the lotteries $pC+\left(1-p\right)A$ and $pD+\left(1-p\right)B$ , for $p>0$ . Notice that every person prefers the first lottery to the second, so by axiom 2, the FAI prefers the first lottery. This remains true for arbitrarily small $p>0$ , so by continuity, the FAI must not prefer the second lottery for $p=0$ ; that is, the FAI must not prefer B over A. We can “sweeten the pot” in favor of B the same way, so by the same reasoning, the FAI must not prefer A over B.

So why should you accept my axioms?

Axiom 1: The VNM utility axioms are widely agreed to be necessary for any rational agent.

Axiom 2: There's something a little rediculous about claiming that every member of a group prefers A to B, but that the group in aggregate does not prefer A to B.

Axiom 3: This axiom is just to establish that it is even possible to aggregate the utility functions in a way that violates axiom 2. So essentially, the theorem is “If it is possible for anything to go horribly wrong, and the FAI does not maximize a linear combination of the people's utility functions, then something will go horribly wrong.” Also, axiom 3 will almost always be true, because it is true when the utility functions are linearly independent, and almost all finite sets of functions are linearly independent. There are terrorists who hate your freedom, but even they care at least a little bit about something other than the opposite of what you care about.

At this point, you might be protesting, “But what about equality? That's definitely a good thing, right? I want something in the FAI's utility function that accounts for equality.” Equality is a good thing, but only because we are risk averse, and risk aversion is already accounted for in the individual utility functions. People often talk about equality being valuable even after accounting for risk aversion, but as Harsanyi's theorem shows, if you do add an extra term in the FAI's utility function to account for equality, then you risk designing an FAI that makes a choice that humanity unanimously disagrees with. Is this extra equality term so important to you that you would be willing to accept that?

Remember that VNM utility has a precise decision-theoretic meaning. Twice as much utility does not correspond to your intuitions about what “twice as much goodness” means. Your intuitions about the best way to distribute goodness to people will not necessarily be good ways to distribute utility. The axioms I used were extremely rudimentary, whereas the intuition that generated "there should be a term for equality or something" is untrustworthy. If they come into conflict, you can't keep all of them. I don't see any way to justify giving up axioms 1 or 2, and axiom 3 will likely remain true whether you want it to or not, so you should probably give up whatever else you wanted to add to the FAI's utility function.

Citations:

Harsanyi, John C. "Cardinal welfare, individualistic ethics, and interpersonal comparisons of utility." The Journal of Political Economy (1955): 309-321.

Hammond, Peter J. "Harsanyi’s utilitarian theorem: A simpler proof and some ethical connotations." IN R. SELTEN (ED.) RATIONAL INTERACTION: ESSAYS IN HONOR OF JOHN HARSANYI. 1992.

Coherent Extrapolated VolitionUtility Functions

Personal Blog

45 Ω 11

New Comment

Rendering 0/90 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 8:27 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

45 Harsanyi's Social Aggregation Theorem and what it means for CEV

by AlexMennen

5th Jan 2013

AI Alignment Forum

4 min read

45 Ω 11

Axiom 1: Every person, and the FAI, are VNM-rational agents.

Axiom 2: Given any two choices A and B such that every person prefers A over B, then the FAI prefers A over B.

Axiom 3: There exist two choices A and B such that every person prefers A over B.

Theorem: The FAI maximizes the expected value of $\sum_{k=1}^{n}c_{k}u_{k}$ , for some set of scalars $\left\{ c_{k}\right\} _{k=1}^{n}$ .

So why should you accept my axioms?

Axiom 1: The VNM utility axioms are widely agreed to be necessary for any rational agent.

Axiom 2: There's something a little rediculous about claiming that every member of a group prefers A to B, but that the group in aggregate does not prefer A to B.

Citations:

Harsanyi, John C. "Cardinal welfare, individualistic ethics, and interpersonal comparisons of utility." The Journal of Political Economy (1955): 309-321.

Hammond, Peter J. "Harsanyi’s utilitarian theorem: A simpler proof and some ethical connotations." IN R. SELTEN (ED.) RATIONAL INTERACTION: ESSAYS IN HONOR OF JOHN HARSANYI. 1992.

Coherent Extrapolated VolitionUtility Functions

Personal Blog

45 Ω 11

Mentioned in

58Complete Class: Consequentialist Foundations

51Original Research on Less Wrong

47Underappreciated points about utility functions (of both sorts)

35A fungibility theorem

3Coalition Dynamics as Morality

New Comment

Rendering 0/90 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 8:27 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

More from AlexMennen

Curated and popular this week

90Comments

Comment Permalink

AlexMennen13y50

I don't agree with Harsanyi's claim that the linear combination of utility functions is unique up to linear transformations. I agree it is unique up to affine transformations, and the discrepancy between my statement and his is explained by his comment "on the understanding that the zero point of the social welfare function is appropriately chosen." (Why he didn't explicitly generalize to affine transformations is beyond me.)

I'm not quite sure what you mean. Are you talking about the fact that you can add a constant to utility function without changing anything important, but that a constant is not necessarily a linear combination of the utility functions to be aggregated? For that reason, it might be best to implicitly include the constant function in any set of utility functions when talking about whether or not they are linearly independent; otherwise you can change the answer by adding a constant to one of them. Also, where did Harsanyi say that?

I don't think the claim "the utility function can be expressed as a linear combination of the individual utility functions" is particularly meaningful, because it just means that the aggregated utility function must exist in the space spanned by the individual utility functions.

Yes, that's what it means. I don't see how that makes it unmeaningful.

Agreed that linear algebra is a natural way to approach this. In fact, I was thinking in similar terms. If you replace axiom 3 with the stronger assumption that the utility functions to be aggregated, along with the constant function, are linearly independent (which I think is still reasonable if there are an infinite number of outcomes, or even if there are just at least 2 more outcomes than agents), then it is fairly easy to show that sharing preferences requires the aggregation to be a linear combination of the utility functions and the constant function.

Let K represent the row vector with all 1s (a constant function). Let "pseudogamble" refer to column vectors whose elements add to 1 (Kx = 1). Note that given two pseudogambles x and y, we can find two gambles x' and y' such that for any agent A, A(x-y) has the same sign as A(x'-y') by mixing the pseuogambles with another gamble. For instance, if x, y, and z are outcomes, and A(x) > A(2y-z), then A(.5x+.5z) > A(.5(2y-z)+.5z) = A(y). So the fact that I'll be talking about pseudogambles rather than gambles is not a problem.

Anyway, if the initial utility functions and K are linearly independent, then the aggregate not being a linear combination of the initial utility functions and K would mean that the aggregate, K, and the initial utility functions all together are linearly independent. Given a linearly independent set of row vectors, it is possible to find a column vector whose product with each row vector is independently specifiable. In particular, you can find column vectors x and y such that Kx=Ky=1, Ax>Ay for all initial utility functions A, and Sx<Sy, where S is the aggregate utility function.

Edit: I just realized that if we use Harsanyi's shared indifference criterion instead of my shared preference criterion, we don't even need the linear independence of the initial utility functions for that argument to work. You can find x and y such that Kx=Ky=1, Ax=Ay for all initial utility functions A, and Sx=/=Sy if S is not a linear combination of the initial utility functions and K, whether or not the initial utility functions are linearly independent of each other, because if you ensure that Ax=Ay for a maximal linearly independent subset of the initial utility functions and K, then it follows that Ax=Ay for the others as well.

Vaniver13y30

Also, where did Harsanyi say that?

Immediately before the statement of Theorem I in section III.

Yes, that's what it means. I don't see how that makes it unmeaningful.

In my mind, there's a meangingful difference between construction and description- yes, you can describe any waveform as an infinite series of sines and cosines, but if you actually want to build one, you probably want to use a finite series. And this result doesn't exclude any exotic methods of constructing utility functions; you could multiply together the utilities of each individual ... (read more)

See in context