DanArmak comments on Holden's Objection 1: Friendliness is dangerous - LessWrong

11 Post author: PhilGoetz 18 May 2012 12:48AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (428)

You are viewing a single comment's thread. Show more comments above.

Comment author: DanArmak 19 May 2012 05:19:12PM 0 points [-]

Well part of my point is that indeed we can't even define CEV today, let alone solve it, and so a lot of conclusions/propositions people put forward about what CEV's output would be like are completely unsupported by evidence; they are mere wishful thinking.

More on-topic: today you have humans as black boxes, but you can still measure what they value, by 1) offering them concrete tradeoffs and measuring behavior and 2) asking them.

Tomorrow, suppose your new brain scanning tech allows you to perfectly understand how brains work. You can now explain how these values are implemented. But they are the same values you observed earlier. So the only new knowledge relevant to CEV would be that you might derive how people would behave in a hypothetical situation, without actually putting them in that situation (because that might be unethical or expensive).

Now, suppose someone expresses a value that you think they are merely "lying, trolling or joking" about. In all of their behavior throughout their lives, and in their own words today, they honestly have this value. But your brain scanner shows that in some hypothetical situation, they would behave consistently with valuing this value less.

By construction, since you couldn't derive this knowledge from their life histories (already known without a brain scanner), these are situations they have (almost) never been in. (And therefore they aren't likely to be in them in the future, either.)

So why do you effectively say that for purposes of CEV, their behavior in such counterfactual situations is "their true values", while their behavior in the real, common situations throughout their lives isn't? Yes, humans might be placed in totally novel situations which can cause them to reconsider their values; because humans have conflicting values, and non-explicit values (but rather behaviors responding to situations), and no truly top-level goals (so that all values may change). But you could just as easily say that there are probably situations in which you could be placed so that you would come to value their values more.

Your approach places people in the unfortunate position where they might live their whole lives believing in a value, and fighting for it, and then you (or the CEV AI) come up to them and says: I'm going to destroy everything you've valued so far. Not because of objective ethics or decree of God or majority vote or anything objective and external. But because they themselves actually "really" prefer completely different values even though on the conscious level, no matter how long they might think and talk and read about it, they would never reach that conclusion.

Comment author: gRR 19 May 2012 05:28:43PM 0 points [-]

In all of their behavior throughout their lives, and in their own words today, they honestly have this value

This is the conditional that I believe is false when I say "they are probably lying, trolling, joking". I believe that when you use the brain scanner on those nihilists, and ask them whether they would prefer the world where everyone is dead to any other possible world, and they say yes, the brain scanner would show they are lying, trolling or joking.

Comment author: DanArmak 19 May 2012 05:31:08PM *  1 point [-]

OK. That's possible. But why do you believe that, despite their large numbers and lifelong avowal of those beliefs?

Comment author: JoshuaZ 19 May 2012 05:40:46PM 0 points [-]

How would you respond if you were subject to such a brain scan and then informed that deep inside you actually are a nihilist who prefers the complete destruction of all life?

Comment author: gRR 19 May 2012 05:43:53PM 0 points [-]

I'd think someone's playing a practical joke on me.

Comment author: JoshuaZ 19 May 2012 05:49:24PM 0 points [-]

And suppose we develop such brain scanning technology and scanning someone else who claims to want the destruction of all life and it says "yep, he does" how would you respond?

Comment author: gRR 19 May 2012 05:56:37PM 0 points [-]

Dunno... propose to kill them quickly and painlessly, maybe? But why do you ask? As I said, I don't expect this to happen.

Comment author: JoshuaZ 19 May 2012 05:59:12PM 0 points [-]

That you don't expect it to happen shouldn't by itself be a reason not to consider it. I'm asking because it seems you are avoiding the hard questions by more or less saying you don't think they will happen. And there are many more conflicting value sets which are less extreme (and apparently more common) than this one.

Comment author: gRR 19 May 2012 06:11:09PM 0 points [-]

Errr. This is a question of simple fact, which is either true or false. I believe it's true, and build the plans accordingly. We can certainly think about contingency plans, of what to do if the belief turns out to be false, but so far no one agreed that the plan is good even in case the belief is true.

Comment author: TheOtherDave 19 May 2012 09:39:25PM 0 points [-]

You've lost me. Can you restate the question of simple fact to which you refer here, which you believe is true? Can you restate the plan that you consider good if that question is true?

Comment author: gRR 19 May 2012 10:33:10PM 0 points [-]

I believe there exist (extrapolated) wishes universal for humans (meaning, true for literally everyone). Among these wishes, I think there is the wish for humans to continue existing. I would like for AI to fulfil this wish (and other universal wishes if there are any), while letting people decide everything else for themselves.