Qiaochu_Yuan comments on Interpersonal and intrapersonal utility comparisons - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (13)
Why? (Edit:) First of all, I don't think that's how the CEV is intended to be used. As I understand it, the CEV is something we extrapolate from all of humanity, and it is a single utility function rather than a utility function for each person. Second, in this context I don't see why the CEV has to be a "sum" over any particular function, whatever you want to call it ("welfare"?), over agents. For example, maybe I really value fairness and don't want one agent's preferences to be satisfied too much more than the others; this would be one way to guard against utility monsters.
Another reason taking "sums" is problematic is that it can be gamed by duplicating an agent's preferences in other agents, e.g. for humans by raising a large number of children.
Actually, your personal preferences are your CEV, not some function that also takes into account other people's CEVs. That's what a CEV is. The point of having a friendly AI aggregate different people's individual preferences together is so that everyone will be able to cooperate on making it, instead of some people having an incentive to interfere (and also because we tend to think about friendly AI in far mode, which has strong fairness norms).
You could suggest that everyone's CEV should be aggregated in some non-additive way, but this risks situations where the aggregation makes a choice that everyone whose preferences got aggregated disagrees with. A weighted sum is the only way to aggregate utility functions that consistently avoids this. I've sketched out a proof of this, but I'm getting tired, so I'll write it up tomorrow.
Weighted sums are fine, so you can just make the duplicates count less. In fact, as I pointed out in the post, there's no such thing as an unweighted sum.
Edit: Apparently the contents of your comment changed drastically as I was drafting this response. But it looks like this still mostly makes sense as a response.
My bad.
I don't think this is how Eliezer is using the term. From the wiki:
So this is 1) a single utility function, not a utility function for each human, and 2) being an aggregate of everything humanity wants, it naturally includes information about what each human wants.
I would be very interested to see this proof! In particular, I want to know what assumptions you're making. As I mentioned way up in the parent comment, I don't see how a weighted sum captures a friendly AI that has preferences about the utility functions that humans use.
Okay, but it still aggregates a utility function-like thing for each human. I don't care what you call it.
For the case of aggregating two people's preferences, only that 1) Both people and the aggregation are VNM utility agents, 2) Whenever both people prefer A to B, the aggregation prefers A to B, and 3) the previous assumption is non-vacuous. Given those, then the aggregation must maximize a weighted sum of their utility functions. For the many-person case, I was using analogous assumptions, but I think there might be a flaw in my induction, so I'll get back to you when I have a proof that actually works.
Edit: http://www.stanford.edu/~hammond/HarsanyiFest.pdf
We currently have preferences about the utility functions that future humans use. So any linear aggregation of our current utility functions will also have preferences about the utility functions that future humans use.