DanArmak comments on Holden's Objection 1: Friendliness is dangerous - Less Wrong

11 Post author: PhilGoetz 18 May 2012 12:48AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (428)

You are viewing a single comment's thread. Show more comments above.

Comment author: gRR 19 May 2012 05:34:59PM -1 points [-]

You could very easily build a much happier life for them just by allocating some resources (land, computronium, whatever) and going by their current values

Well... ok, lets assume a happy life is their single terminal value. Then by definition of their extrapolated values, you couldn't build a happier life for them if you did anything else other than follow their extrapolated values!

Comment author: DanArmak 19 May 2012 06:09:10PM 0 points [-]

This is completely wrong. People are happy, by definition, if their actual values are fulfilled; not if some conflicting extrapolated values are fulfilled. CEV was supposed to get around this by proposing (without saying how) that people would actually grow to become smarter etc. and thereby modify their actual values to match the extrapolated ones, and then they'd be happy in a universe optimized for the extrapolated (now actual) values. But you say you don't want to change other people's values to match the extrapolation. That makes CEV a very bad idea - most people will be miserable, probably including you!

Comment author: gRR 20 May 2012 01:50:24AM 1 point [-]

People are happy, by definition, if their actual values are fulfilled

Yes, but values depend on knowledge. There was an example by EY, I forgot where, in which someone values a blue box because they think the blue box contains a diamond. But if they're wrong, and it's actually the red box that contains the diamond, then what would actually make them happy - giving them the blue or the red box? And would you say giving them the red box is making them suffer?

Well, perhaps yes. Therefore, a good extrapolated wish would include constraints on the speed of its own fulfillment: allow the person to take the blue box, then convince them that it is the red box they actually want, and only then present it. But in cases where this is impossible (example: blue box contains horrible violent death), then it is wrong to say that following the extrapolated values (withholding the blue box) is making the person suffer. Following their extrapolated values is the only way to allow them to have a happy life.

Comment author: DanArmak 22 May 2012 01:31:29PM 0 points [-]

What you are saying indeed applies only "in cases where this is impossible". I further suggest that these are extremely rare cases when a superhumanly-powerful AI is in charge. If the blue box contains horrible violent death, the AI would build a new (third) box, put a diamond inside, paint it blue, and give it to the person.

Comment author: gRR 22 May 2012 03:56:49PM 0 points [-]

the AI would build a new (third) box, put a diamond inside, paint it blue, and give it to the person

It the AI could do this, then this is exactly what the extrapolated values would tell it to do. [Assuming some natural constraints on the original values].

Comment author: DanArmak 22 May 2012 04:15:30PM 0 points [-]

The actual values would also tell it to do so. This is a case where the two coincide. In most cases they don't.

Comment author: gRR 22 May 2012 04:20:45PM 1 point [-]

No, the "actual" values would tell it to give the humans the blue boxes they want, already.

Comment author: DanArmak 22 May 2012 04:30:55PM *  0 points [-]

The humans don't value the blue box directly. It's an instrumental value because of what they think is inside. The humans really value (in actual, not extrapolated values) the diamond they think is inside.

That's a problem with your example (of the boxes): the values are instrumental, the boxes are not supposed to be valued in themselves.

ETA: wrong and retracted. See below.

Comment author: TheOtherDave 22 May 2012 04:50:24PM 2 points [-]

Well, they don't value the diamond, either, on this account.

Perhaps they value the wealth they think they can have if they obtain the diamond, or perhaps they value the things they can buy given that diamond, or perhaps they value something else. It's hard to say, once we give up talking about the things we actually observe people trading other things for as being things they value.

Comment author: DanArmak 22 May 2012 06:31:17PM 0 points [-]

You're right and I was wrong on this point. Please see my reply to gRR's sister comment.

Comment author: gRR 22 May 2012 04:50:06PM 2 points [-]

Humans don't know which of their values are terminal and which are instrumental, and whether this question even makes sense in general. Their values were created by two separate evolutionary processes. In the boxes example, humans may not know about the diamond. Maybe they value blue boxes because their ancestors could always bring a blue box to a jeweler and exchange it for food, or something.

This is precisely the point of extrapolation - to untangle the values from each other and build a coherent system, if possible.

Comment author: DanArmak 22 May 2012 06:30:52PM *  1 point [-]

You're right about this point (and so is TheOtherDave) and I was wrong.

With that, I find myself unsure as to what we agree and disagree on. Back here you said "Well, perhaps yes." I understand that to mean you agree with my point that it's wrong / bad for the AI to promote extrapolated values while the actual values are different and conflicting. (If this is wrong please say so.)

Talking further about "extrapolated" values may be confusing in this context. I think we can taboo that and reach all the same conclusions while only mentioning actual values.

The AI starts out by implementing humans' actual present values. If some values (want blue box) lead to actually-undesired outcomes (blue box really contains death), that is a case of conflicting actual values (want blue box vs. want to not die). The AI obviously needs to be able to manage conflicting actual values, because humans always have them, but that is true regardless of CEV.

Additionally, the AI may foresee that humans are going to change and in the future have some other actual values; call these the future-values. This change may be described as "gaining intelligence etc." (as in CEV) or it may be a different sort of change - it doesn't matter for our purposes. Suppose the AI anticipates this change, and has no imperative to prevent it (such as helping humans avoid murderer-Gandhi pills due to present human values), or maybe even has an imperative to assist this change (again, according to current human values). Then the AI will want to avoid doing things today which will make its task harder tomorrow, or which will cause future people to regret their past actions: it may find itself striking a balance between present and future (predicted) human values.

This is, at the very least, dangerous - because it involves satisfying current human values not as fully as possible, while the AI may be wrong about future values. Also, the AI's actions unavoidably influence humans and so probably influence which future values they eventually have. My position is that the AI must be guided by the humans' actual present values in choosing to steer human (social) evolution towards or away from possible future values. This has lots of downsides, but what better option is there?

In contrast, CEV claims there is some unique "extrapolated" set of future values which is special, stable once reached, universal for all humans, and that it's Good to steer humanity towards it even if it conflicts with many people's present values. But I haven't seen any convincing to me arguments that such "extrapolated" values exist and have any of those qualities (uniqueness, stability, universal compatibility, Goodness).

Do you agree with this summary? Which points do you disagree with me on?