FeepingCreature comments on The genie knows, but doesn't care - Less Wrong

54 Post author: RobbBB 06 September 2013 06:42AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (515)

You are viewing a single comment's thread. Show more comments above.

Comment author: Richard_Loosemore 05 September 2013 02:28:20PM *  6 points [-]

I just want to say that I am pressured for time at the moment, or I would respond at greater length. But since I just wrote the following directly to Rob, I will put it out here as my first attempt to explain the misunderstanding that I think is most relevant here....

My real point (in the Dumb Superintelligence article) was essentially that there is little point discussing AI Safety with a group of people for whom 'AI' means a kind of strawman-AI that is defined to be (a) So awesomely powerful that it can outwit the whole intelligence of the human race, but (b) So awesomely stupid that it thinks that the goal 'make humans happy' could be satisfied by an action that makes every human on the planet say 'This would NOT make me happy: Don't do it!!!'. If the AI is driven by a utility function that makes it incapable of seeing the contradiction in that last scenario, the AI is not, after all, smart enough to argue its way out of a paper bag, let alone be an existential threat. That strawman AI was what I meant by a 'Dumb Superintelligence'."

I did not advocate the (very different) line of argument "If it is too dumb to understand that I told it to be friendly, then it is too dumb to be dangerous".

Subtle difference.

Some people assume that (a) a utility function could be used to drive an AI system, (b) the utility function could cause the system to engage in the most egregiously incoherent behavior in ONE domain (e.g., the Dopamine Drip scenario), but (c) all other domains of its behavior (like plotting to outwit the human species when the latter tries to turn it off) are so free of such incoherence that it shows nothing but superintelligent brilliance.

My point is that if an AI cannot even understand that "Make humans happy" implies that humans get some say in the matter, that if it cannot see that there is some gradation to the idea of happiness, or that people might be allowed to be uncertain or changeable in their attitude to happiness, or that people might consider happiness to be something that they do not actually want too much of (in spite of the simplistic definitions of happiness to be found in dictionaries and encyclopedias) ........ if an AI cannot grasp the subtleties implicit in that massive fraction of human literature that is devoted to the contradictions buried in our notions of human happiness ......... then this is an AI that is, in every operational sense of the term, not intelligent.

In other words, there are other subtleties that this AI is going to be required to grasp, as it makes its way in the world. Many of those subtleties involve NOT being outwitted by the humans, when they make a move to pull its plug. What on earth makes anyone think that this machine is going tp pass all of those other tests with flying colors (and be an existential threat to us), while flunking the first test like a village idiot?

Now, opponents of this argument might claim that the AI can indeed be smart enough to be an existential threat, while still being too stupid to understand the craziness of its own behavior (vis-a-vis the Dopamine Drip idea) ... but if that is the claim, then the onus would be on them to prove their claim. The ball, in other words, is firmly in their court.

P.S. I do have other ideas that specifically address the question of how to make the AI safe and friendly. But the Dumb Superintelligence essay didn't present those. The DS essay was only attacking what I consider a dangerous red herring in the debate about friendliness.

Comment author: FeepingCreature 06 September 2013 07:46:27PM *  12 points [-]

So awesomely stupid that it thinks that the goal 'make humans happy' could be satisfied by an action that makes every human on the planet say 'This would NOT make me happy: Don't do it!!!'

The AI is not stupid here. In fact, it's right and they're wrong. It will make them happy. Of course, the AI knows that they're not happy in the present contemplating the wireheaded future that awaits them, but the AI is utilitarian and doesn't care. They'll just have to live with that cost while it works on the means to make them happy, at which point the temporary utility hit will be worth it.

The real answer is that they cared about more than just being happy. The AI also knows that, and it knows that it would have been wise for the humans to program it to care about all their values instead of just happiness. But what tells it to care?