DanArmak comments on Holden's Objection 1: Friendliness is dangerous - LessWrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (428)
Who cares about their extrapolated values? Not them (they keep their original values). Not others (who have different actual and extrapolated values). Then why extrapolate their values at all? You could very easily build a much happier life for them just by allocating some resources (land, computronium, whatever) and going by their current values.
Well... ok, lets assume a happy life is their single terminal value. Then by definition of their extrapolated values, you couldn't build a happier life for them if you did anything else other than follow their extrapolated values!
This is completely wrong. People are happy, by definition, if their actual values are fulfilled; not if some conflicting extrapolated values are fulfilled. CEV was supposed to get around this by proposing (without saying how) that people would actually grow to become smarter etc. and thereby modify their actual values to match the extrapolated ones, and then they'd be happy in a universe optimized for the extrapolated (now actual) values. But you say you don't want to change other people's values to match the extrapolation. That makes CEV a very bad idea - most people will be miserable, probably including you!
Yes, but values depend on knowledge. There was an example by EY, I forgot where, in which someone values a blue box because they think the blue box contains a diamond. But if they're wrong, and it's actually the red box that contains the diamond, then what would actually make them happy - giving them the blue or the red box? And would you say giving them the red box is making them suffer?
Well, perhaps yes. Therefore, a good extrapolated wish would include constraints on the speed of its own fulfillment: allow the person to take the blue box, then convince them that it is the red box they actually want, and only then present it. But in cases where this is impossible (example: blue box contains horrible violent death), then it is wrong to say that following the extrapolated values (withholding the blue box) is making the person suffer. Following their extrapolated values is the only way to allow them to have a happy life.
What you are saying indeed applies only "in cases where this is impossible". I further suggest that these are extremely rare cases when a superhumanly-powerful AI is in charge. If the blue box contains horrible violent death, the AI would build a new (third) box, put a diamond inside, paint it blue, and give it to the person.
It the AI could do this, then this is exactly what the extrapolated values would tell it to do. [Assuming some natural constraints on the original values].
The actual values would also tell it to do so. This is a case where the two coincide. In most cases they don't.
No, the "actual" values would tell it to give the humans the blue boxes they want, already.
The humans don't value the blue box directly. It's an instrumental value because of what they think is inside. The humans really value (in actual, not extrapolated values) the diamond they think is inside.
That's a problem with your example (of the boxes): the values are instrumental, the boxes are not supposed to be valued in themselves.
ETA: wrong and retracted. See below.
Well, they don't value the diamond, either, on this account.
Perhaps they value the wealth they think they can have if they obtain the diamond, or perhaps they value the things they can buy given that diamond, or perhaps they value something else. It's hard to say, once we give up talking about the things we actually observe people trading other things for as being things they value.
Humans don't know which of their values are terminal and which are instrumental, and whether this question even makes sense in general. Their values were created by two separate evolutionary processes. In the boxes example, humans may not know about the diamond. Maybe they value blue boxes because their ancestors could always bring a blue box to a jeweler and exchange it for food, or something.
This is precisely the point of extrapolation - to untangle the values from each other and build a coherent system, if possible.