Jiro comments on The Hidden Complexity of Wishes - Less Wrong

58 Post author: Eliezer_Yudkowsky 24 November 2007 12:12AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (121)

Sort By: Old

You are viewing a single comment's thread. Show more comments above.

Comment author: Jiro 24 August 2013 05:05:29PM *  0 points [-]

Superintelligence can use strategies you can't undertstand.

There's a contradiction between "the superintelligence will do something you don't want" and "the superintelligence will do something you don't understand". Not wanting it implies I understand enough about it to not want it (even if I don't understand every single step).

that's kind of the definition of a clueless genie

I would consider a clueless genie to be a genie that tries to grant my wishes, but because it doesn't understand me, grants my wishes in a way that I wouldn't want. A malicious genie is a genie that grants my wishes in a way that it knows I wouldn't want. Reserving that term for genies that intentionally annoy while excluding genies that merely knowingly annoy is hairsplitting and only changes the terminology anyway.

Also, some Good solutions might require fooling you for your own good, if only because there's no time to explain.

If I would in fact want genies to fool me for my own good in such situations, this isn't a problem.

On the other hand, if I think that genies should not try to fool me for my own good in such situations, and the genie knows this, and it fools me for my own good anyway, it's a malicious genie by my standards. The genie has not failed to understand me; it understands what I want perfectly well, but knowingly does something contrary to its understanding of my desires. In the original example, the genie would be asked to save my mother from a building, it knows that I don't want it to explode the building to get her out, and it explodes the building anyway.

Comment author: MugaSofer 26 August 2013 03:17:24PM *  0 points [-]

There's a contradiction between "the superintelligence will do something you don't want" and "the superintelligence will do something you don't understand". Not wanting it implies I understand enough about it to not want it (even if I don't understand every single step).

Well, firstly, there might be things you wouldn't want if you could only understand them. But actually, I was thinking of actions that would affect society in subtle, sweeping ways. Sure, if the results were explained to you, you might not like them, but you built the genie to grant wishes, not explain them. And how sure are you that's even possible, for all possible wish-granting methods?

I would consider a clueless genie to be a genie that tries to grant my wishes, but because it doesn't understand me, grants my wishes in a way that I wouldn't want. A malicious genie is a genie that grants my wishes in a way that it knows I wouldn't want. Reserving that term for genies that intentionally annoy while excluding genies that merely knowingly annoy is hairsplitting and only changes the terminology anyway.

Well, that's what the term usually means. And, honestly, I think there's good reason for that; it takes a pretty precise definition of "non-malicious genie", AKA FAI, not to do Bad Things, which is kind of the point of this essay.

Comment author: Jiro 26 August 2013 03:36:26PM *  2 points [-]

Sure, if the results were explained to you, you might not like them, but you built the genie to grant wishes, not explain them.

That's why I suggested you can talk to the genie. Provided the genie is not malicious, it shouldn't conceal any such consequences; you just need to quiz it well.

It's sort of like the Turing test, but used to determine wish acceptability instead of intelligence. If a human can talk to it and say it is a person, treat it like a person. If a human can talk to it and decide the wish is good, treat the wish as good. And just like the Turing test, it relies on the fact that humans are better at asking questions during the process than writing long lists of prearranged questions that try to cover all situations in advance.

Well, that's what the term usually means.

Really? A clueless genie is a genie that is asked to do something, knows that the way it does it is displeasing to you, and does it anyway? I wouldn't call that a clueless genie.

What terms would you use for

-- a genie that would never knowingly displease you in granting wishes, but may do so out of ignorance

-- a genie that will knowingly displease you in granting wishes

-- a genie that will deliberately displease you in granting wishes?

Comment author: MugaSofer 26 August 2013 04:35:53PM 1 point [-]

More full response coming soon to a comment box near you. For now, terms! Everyone loves terms.

Really?

Here's how I learned it:

A "genie" will grant your wishes, without regard to what you actually want.

A malicious genie will grant your wishes, but deliberately seek out ways to do so that will do things you don't actually want.

A helpful - or Friendly - genie will work out what you actually wanted in the first place, and just give you that, without any of this tiresome "wishing" business. Sometimes called a "useful" genie - there's really no one agreed-on term. Essentially, what you're trying to replicate with carefully-worded wishes to other genies.

Comment author: Jiro 26 August 2013 08:19:50PM *  0 points [-]

I want to know what terms you would use that would distinguish between a genie that grants wishes in ways I don't want because it doesn't know any better, and a genie that grants wishes in ways I don't want despite knowing better.

By your definitions above, these are both just "genie" and you don't really have terms to distinguish between them at all.

Comment author: MugaSofer 26 August 2013 09:39:27PM 0 points [-]

Well, since the whole genie thing is a metaphor for superintelligence, "this genie is trying to be Friendly but it's too dumb to model you well" doesn't really come up. If it did, I guess you would need to invent a new term (Friendly Narrow AI?) to distinguish it, yeah.

Comment author: Jiro 26 August 2013 10:15:41PM *  0 points [-]

It's my impression that the typical scenario of a superintelligence that kills everyone to make paperclips, because you told it to make paperclips, falls into the first category. It's trying to follow your request; it just doesn't know that your request really means "I want to make paperclips, subject to some implicit constraints such as ethics, being able to stop when told to stop, etc." If it does know what your request really means, yet it still maximizes paperclips by killing people, it's disobeying your intention if not your literal words.

(And then there's always the possibility of telling it "make paperclips, in the way that I mean when I ask that". If you say that, and the AI still kills people, it's unfriendly by both our standards--since your request explicitly told it to follow your intention, disobeying your intention also disobeys your literal words.)

Comment author: MugaSofer 28 August 2013 06:19:42PM 0 points [-]

It's trying to follow your request; it just doesn't know that your request really means "I want to make paperclips, subject to some implicit constraints such as ethics, being able to stop when told to stop, etc." If it does know what your request really means, yet it still maximizes paperclips by killing people, it's disobeying your intention if not your literal words.

Well, sure it is. That's the point of genies (and the analogous point about programming AIs): they do what you tell them, not what you wanted.

Comment author: private_messaging 28 August 2013 07:54:33PM *  1 point [-]

What you tell is a pattern of pressure changes in the air, it's only the megaphones and tape recorders that literally "do what you tell them".

The genie that would do what you want would have to use the pressure changes as a clue for deducing your intent. When writing a story about a genie that does "what you tell them, not what you wanted" you have to use the pressure changes as a clue for deducing some range of misunderstandings of those orders, and then pick some understanding that you think makes the best story. It may be that we have an innate mechanism for finding the range of possible misunderstandings, to be able to combine following orders with self interest.

Comment author: ArisKatsaris 28 August 2013 08:16:01PM *  5 points [-]

"What you tell them" in the context of programs is meant in the sense of "What you program them to", not in the sense of "The dictionary definition of the word-noises you make when talking into their speakers".