PeterisP comments on The genie knows, but doesn't care - Less Wrong

54 Post author: RobbBB 06 September 2013 06:42AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (515)

You are viewing a single comment's thread. Show more comments above.

Comment author: PhilGoetz 06 September 2013 12:00:50AM *  4 points [-]

The deeper problem is that you can't really program "make me happy" in the same way that you can't program "make this image look like I want".

On one hand, Friendly AI people want to convert "make me happy" to a formal specification. Doing that has many potential pitfalls. because it is a formal specification.

On the other hand, Richard, I think, wants to simply tell the AI, in English, "Make me happy." Given that approach, he makes the reasonable point that any AI smart enough to be dangerous would also be smart enough to interpret that at least as intelligently as a human would.

I think the important question here is, Which approach is better? LW always assumes the first, formal approach.

To be more specific (and Bayesian): Which approach gives a higher expected value? Formal specification is compatible with Eliezer's ideas for friendly AI as something that will provably avoid disaster. It has some non-epsilon possibility of actually working. But its failure modes are many, and can be literally unimaginably bad. When it fails, it fails catastrophically, like a monotonic logic system with one false belief.

"Tell the AI in English" can fail, but the worst case is closer to a "With Folded Hands" scenario than to paperclips.

I've never considered the "Tell the AI what to do in English" approach before, but on first inspection it seems safer to me.

Comment author: PeterisP 06 September 2013 09:14:47AM *  2 points [-]

"Tell the AI in English" is in essence an utility function "Maximize the value of X, where X is my current opinion of what some english text Y means".

The 'understanding English' module, the mapping function between X and "what you told in English" is completely arbitrary, but is very important to the AI - so any self-modifying AI will want to modify and improve that. Also, we don't have a good "understanding English" module so yes, we also want the AI to be able to modify and improve that. But, it can be wildly different from reality or opinions of humans - there are trivial ways of how well-meaning dialogue systems can misunderstand statements.

However, for the AI "improve the module" means "change the module so that my utility grows" - so in your example it has strong motivation to intentionally misunderstand English. The best case scenario is to misunderstand "Make everyone happy" as "Set your utility function to MAXINT". The worst case scenario is, well, everything else.

There's the classic quote "It is difficult to get a man to understand something, when his salary depends upon his not understanding it!" - if the AI doesn't care in the first place, then "Tell AI what to do in English" won't make it care.

Comment author: Jiro 06 September 2013 05:27:22PM *  3 points [-]

By this reasoning, an AI asked to do anything at all would respond by immediately modifying itself to set its utility function to MAXINT. You don't need to speak to it in English for that--if you asked the AI to maximize paperclips, that is the equivalent of "Maximize the value of X, where X is my current opinion of how many paperclips there are", and it would modify its paperclip-counting module to always return MAXINT.

You are correct that telling the AI to do Y is equivalent to "maximize the value of X, where X is my current opinion about Y". However, "current" really means "current", not "new". If the AI is actually trying to obey the command to do Y, it won't change its utility function unless having a new utility function will increase its utility according to its current utility function. Neither misunderstanding nor understanding will raise its utility unless its current utility function values having a utility function that misunderstands or understands.

Comment author: Nornagest 08 September 2013 07:32:59AM *  3 points [-]

By this reasoning, an AI asked to do anything at all would respond by immediately modifying itself to set its utility function to MAXINT.

That's allegedly more or less what happened to Eurisko (here, section 2), although it didn't trick itself quite that cleanly. The problem was only solved by algorithmically walling off its utility function from self-modification: an option that wouldn't work for sufficiently strong AI, and one to avoid if you want to eventually allow your AI the capacity for a more precise notion of utility than you can give it.

Paperclipping as the term's used here assumes value stability.

Comment author: PhilGoetz 07 September 2013 04:28:59AM 0 points [-]

A human is a counterexample. A human emulation would count as an AI, so human behavior is one possible AI behavior. Richard's argument is that humans don't respond to orders or requests in anything like these brittle, GOFAI-type systems invoked by the word "formal systems". You're not considering that possibility. You're still thinking in terms of formal systems.

(Unpacking the significant differences between how humans operate, and the default assumptions that the LW community makes about AI, would take... well, five years, maybe ten.)

Comment author: nshepperd 08 September 2013 03:06:24AM *  1 point [-]

A human emulation would count as an AI, so human behavior is one possible AI behavior.

Uhh, no. Look, humans respond to orders and requests in the way that we do because we tend to care what the person giving the request actually wants. Not because we're some kind of "informal system". Any computer program is a formal system, but there are simply more and less complex ones. All you are suggesting is building a very complex ("informal") system and hoping that because it's complex (like humans!) it will behave in a humanish way.

Comment author: bouilhet 10 September 2013 07:23:26PM 1 point [-]

Your response avoids the basic logic here. A human emulation would count as an AI, therefore human behavior is one possible AI behavior. There is nothing controversial in the statement; the conclusion is drawn from the premise. If you don't think a human emulation would count as AI, or isn't possible, or something else, fine, but... why wouldn't a human emulation count as an AI? How, for example, can we even think about advanced intelligence, much less attempt to model it, without considering human intelligence?

...humans respond to orders and requests in the way that we do because we tend to care what the person giving the request actually wants.

I don't think this is generally an accurate (or complex) description of human behavior, but it does sound to me like an "informal system" - i.e. we tend to care. My reading of (at least this part of) PhilGoetz's position is that it makes more sense to imagine something we would call an advanced or super AI responding to requests and commands with a certain nuance of understanding (as humans do) than with the inflexible ("brittle") formality of, say, your average BASIC program.

Comment author: linkhyrule5 07 September 2013 04:56:57AM 0 points [-]

The thing is, humans do that by... well, not being formal systems. Which pretty much requires you to keep a good fraction of the foibles and flaws of a nonformal, nonrigorously rational system.

You'd be more likely to get FAI, but FAI itself would be devalued, since now it's possible for the FAI itself to make rationality errors.

Comment author: Baughn 11 September 2013 12:30:56AM 1 point [-]

More likely, really?

You're essentially proposing giving a human Ultimate Power. I doubt that will go well.

Comment author: linkhyrule5 11 September 2013 01:14:58AM 3 points [-]

Iunno. Humans are probably less likely to go horrifically insane with power than the base chance of FAI.

Your chances aren't good, just better.