MugaSofer comments on The Hidden Complexity of Wishes - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (121)
More full response coming soon to a comment box near you. For now, terms! Everyone loves terms.
Here's how I learned it:
A "genie" will grant your wishes, without regard to what you actually want.
A malicious genie will grant your wishes, but deliberately seek out ways to do so that will do things you don't actually want.
A helpful - or Friendly - genie will work out what you actually wanted in the first place, and just give you that, without any of this tiresome "wishing" business. Sometimes called a "useful" genie - there's really no one agreed-on term. Essentially, what you're trying to replicate with carefully-worded wishes to other genies.
I want to know what terms you would use that would distinguish between a genie that grants wishes in ways I don't want because it doesn't know any better, and a genie that grants wishes in ways I don't want despite knowing better.
By your definitions above, these are both just "genie" and you don't really have terms to distinguish between them at all.
Well, since the whole genie thing is a metaphor for superintelligence, "this genie is trying to be Friendly but it's too dumb to model you well" doesn't really come up. If it did, I guess you would need to invent a new term (Friendly Narrow AI?) to distinguish it, yeah.
It's my impression that the typical scenario of a superintelligence that kills everyone to make paperclips, because you told it to make paperclips, falls into the first category. It's trying to follow your request; it just doesn't know that your request really means "I want to make paperclips, subject to some implicit constraints such as ethics, being able to stop when told to stop, etc." If it does know what your request really means, yet it still maximizes paperclips by killing people, it's disobeying your intention if not your literal words.
(And then there's always the possibility of telling it "make paperclips, in the way that I mean when I ask that". If you say that, and the AI still kills people, it's unfriendly by both our standards--since your request explicitly told it to follow your intention, disobeying your intention also disobeys your literal words.)
Well, sure it is. That's the point of genies (and the analogous point about programming AIs): they do what you tell them, not what you wanted.
What you tell is a pattern of pressure changes in the air, it's only the megaphones and tape recorders that literally "do what you tell them".
The genie that would do what you want would have to use the pressure changes as a clue for deducing your intent. When writing a story about a genie that does "what you tell them, not what you wanted" you have to use the pressure changes as a clue for deducing some range of misunderstandings of those orders, and then pick some understanding that you think makes the best story. It may be that we have an innate mechanism for finding the range of possible misunderstandings, to be able to combine following orders with self interest.
"What you tell them" in the context of programs is meant in the sense of "What you program them to", not in the sense of "The dictionary definition of the word-noises you make when talking into their speakers".
They were talking of genies, though, and the sort of failure that tends to arise from how a short sentence describes multitude of diverse intents (i.e. ambiguity). Programming is about specifying what you want in extremely verbose manner, the verbosity being a necessary consequence of non-ambiguity.