You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

buybuydandavis comments on Harsanyi's Social Aggregation Theorem and what it means for CEV - Less Wrong Discussion

21 Post author: AlexMennen 05 January 2013 09:38PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (86)

You are viewing a single comment's thread.

Comment author: buybuydandavis 05 January 2013 11:47:25PM 7 points [-]

But many people don't like this, usually for reasons involving utility monsters. If you are one of these people, then you better learn to like it, because according to Harsanyi's Social Aggregation Theorem, any alternative can result in the supposedly Friendly AI making a choice that is bad for every member of the population. More formally,

That a bad result can happen in a given strategy is not a conclusive argument against preferring that strategy. Will it happen? What's the likelihood that it happens? What's the cost if it does happen?

The two alternatives discussed each has their own failure mode, while your "better learn to like it" admonition seems to imply that one side is compelled by the failure mode of their preferred strategy to give it up for the alternative strategy.

Why is this new failure mode supposed to be decisive in the choice between the two alternatives?

Comment author: AlexMennen 06 January 2013 12:12:56AM 0 points [-]

That a bad result can happen in a given strategy is not a conclusive argument against preferring that strategy.

It's possible that the AI would just happen never to confront a situation where it would choose differently than everyone else would, but not reliably. If you had an AI that violated axiom 2, it would be tempting to modify it to include the special case "If X is the best option in expectation for every morally relevant agent, then do X." It seems hard to argue that such a modification would not be an improvement. And yet only throwing in that special case would make it no longer VNM-rational. Worse than a VNM-irrational agent is pretty bad.

Why is this new failure mode supposed to be decisive in the choice between the two alternatives?

Because maximizing a weighted sum of utility functions does not have any comparably convincing failure modes. None that I've heard of anyway, and I'd be pretty shocked if you came up with a failure mode that did compete.

Comment author: buybuydandavis 06 January 2013 10:50:53PM 3 points [-]

Because maximizing a weighted sum of utility functions does not have any comparably convincing failure modes.

You don't think utility monster is a comparably convincing failure mode?

I think we just don't have data one way or the other.

Comment author: AlexMennen 06 January 2013 11:14:38PM *  -2 points [-]

Utility monster isn't a failure mode. It just messes with our intuitions because no one could imagine being a utility monster.

Edit: At the time I made this comment, the wikipedia article on utility monsters incorrectly stated that a utility monster meant an agent that gets increasing marginal utility with respect to resources. Now that I know that a utility monster means an agent that gets much more utility from resources than other agents do, my response is that you can multiply the utility monster's utility function by a small coefficient, so that it no longer acts as a utility monster.