Jiro comments on The genie knows, but doesn't care - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (515)
I just want to say that I am pressured for time at the moment, or I would respond at greater length. But since I just wrote the following directly to Rob, I will put it out here as my first attempt to explain the misunderstanding that I think is most relevant here....
My real point (in the Dumb Superintelligence article) was essentially that there is little point discussing AI Safety with a group of people for whom 'AI' means a kind of strawman-AI that is defined to be (a) So awesomely powerful that it can outwit the whole intelligence of the human race, but (b) So awesomely stupid that it thinks that the goal 'make humans happy' could be satisfied by an action that makes every human on the planet say 'This would NOT make me happy: Don't do it!!!'. If the AI is driven by a utility function that makes it incapable of seeing the contradiction in that last scenario, the AI is not, after all, smart enough to argue its way out of a paper bag, let alone be an existential threat. That strawman AI was what I meant by a 'Dumb Superintelligence'."
I did not advocate the (very different) line of argument "If it is too dumb to understand that I told it to be friendly, then it is too dumb to be dangerous".
Subtle difference.
Some people assume that (a) a utility function could be used to drive an AI system, (b) the utility function could cause the system to engage in the most egregiously incoherent behavior in ONE domain (e.g., the Dopamine Drip scenario), but (c) all other domains of its behavior (like plotting to outwit the human species when the latter tries to turn it off) are so free of such incoherence that it shows nothing but superintelligent brilliance.
My point is that if an AI cannot even understand that "Make humans happy" implies that humans get some say in the matter, that if it cannot see that there is some gradation to the idea of happiness, or that people might be allowed to be uncertain or changeable in their attitude to happiness, or that people might consider happiness to be something that they do not actually want too much of (in spite of the simplistic definitions of happiness to be found in dictionaries and encyclopedias) ........ if an AI cannot grasp the subtleties implicit in that massive fraction of human literature that is devoted to the contradictions buried in our notions of human happiness ......... then this is an AI that is, in every operational sense of the term, not intelligent.
In other words, there are other subtleties that this AI is going to be required to grasp, as it makes its way in the world. Many of those subtleties involve NOT being outwitted by the humans, when they make a move to pull its plug. What on earth makes anyone think that this machine is going tp pass all of those other tests with flying colors (and be an existential threat to us), while flunking the first test like a village idiot?
Now, opponents of this argument might claim that the AI can indeed be smart enough to be an existential threat, while still being too stupid to understand the craziness of its own behavior (vis-a-vis the Dopamine Drip idea) ... but if that is the claim, then the onus would be on them to prove their claim. The ball, in other words, is firmly in their court.
P.S. I do have other ideas that specifically address the question of how to make the AI safe and friendly. But the Dumb Superintelligence essay didn't present those. The DS essay was only attacking what I consider a dangerous red herring in the debate about friendliness.
I tried arguing basically the same thing.
The most coherent reply I got was that an AI doesn't follow verbal instructions and we can't just order the AI to "make humans happy", or even "make humans happy, in the way that I mean". You can only tell the AI to make humans happy by writing a program that makes it do so. It doesn't matter if the AI grasps what you really want it to do, if there is a mismatch between the program and what you really want it to do, it follows the program.
Obviously I don't buy this. For one thing, you can always program it to obey verbal instructions, or you can talk to it and ask it how it will make people happy.
Jiro: Did you read my post? I discuss whether getting an AI to 'obey verbal instructions' is a trivial task in the first named section. I also link to section 2 of Yudkowsky's reply to Holden, which addresses the question of whether 'talk to it and ask it how it will make people happy' is generally a safe way to interact with an Unfriendly Oracle.
I also specifically quote an argument you made in section 2 that I think reflects a common mistake in this whole family of misunderstandings of the problem — the conflation of the seed AI with the artificial superintelligence it produces. Do you agree this distinction helps clarify why the problem is one of coding the right values, and not of coding the right factual knowledge or intelligence-relevant capacities?