Eliezer_Yudkowsky comments on The genie knows, but doesn't care - Less Wrong

54 Post author: RobbBB 06 September 2013 06:42AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (515)

You are viewing a single comment's thread. Show more comments above.

Comment author: Richard_Loosemore 05 September 2013 02:28:20PM *  6 points [-]

I just want to say that I am pressured for time at the moment, or I would respond at greater length. But since I just wrote the following directly to Rob, I will put it out here as my first attempt to explain the misunderstanding that I think is most relevant here....

My real point (in the Dumb Superintelligence article) was essentially that there is little point discussing AI Safety with a group of people for whom 'AI' means a kind of strawman-AI that is defined to be (a) So awesomely powerful that it can outwit the whole intelligence of the human race, but (b) So awesomely stupid that it thinks that the goal 'make humans happy' could be satisfied by an action that makes every human on the planet say 'This would NOT make me happy: Don't do it!!!'. If the AI is driven by a utility function that makes it incapable of seeing the contradiction in that last scenario, the AI is not, after all, smart enough to argue its way out of a paper bag, let alone be an existential threat. That strawman AI was what I meant by a 'Dumb Superintelligence'."

I did not advocate the (very different) line of argument "If it is too dumb to understand that I told it to be friendly, then it is too dumb to be dangerous".

Subtle difference.

Some people assume that (a) a utility function could be used to drive an AI system, (b) the utility function could cause the system to engage in the most egregiously incoherent behavior in ONE domain (e.g., the Dopamine Drip scenario), but (c) all other domains of its behavior (like plotting to outwit the human species when the latter tries to turn it off) are so free of such incoherence that it shows nothing but superintelligent brilliance.

My point is that if an AI cannot even understand that "Make humans happy" implies that humans get some say in the matter, that if it cannot see that there is some gradation to the idea of happiness, or that people might be allowed to be uncertain or changeable in their attitude to happiness, or that people might consider happiness to be something that they do not actually want too much of (in spite of the simplistic definitions of happiness to be found in dictionaries and encyclopedias) ........ if an AI cannot grasp the subtleties implicit in that massive fraction of human literature that is devoted to the contradictions buried in our notions of human happiness ......... then this is an AI that is, in every operational sense of the term, not intelligent.

In other words, there are other subtleties that this AI is going to be required to grasp, as it makes its way in the world. Many of those subtleties involve NOT being outwitted by the humans, when they make a move to pull its plug. What on earth makes anyone think that this machine is going tp pass all of those other tests with flying colors (and be an existential threat to us), while flunking the first test like a village idiot?

Now, opponents of this argument might claim that the AI can indeed be smart enough to be an existential threat, while still being too stupid to understand the craziness of its own behavior (vis-a-vis the Dopamine Drip idea) ... but if that is the claim, then the onus would be on them to prove their claim. The ball, in other words, is firmly in their court.

P.S. I do have other ideas that specifically address the question of how to make the AI safe and friendly. But the Dumb Superintelligence essay didn't present those. The DS essay was only attacking what I consider a dangerous red herring in the debate about friendliness.

Comment author: RobbBB 05 September 2013 03:48:35PM *  13 points [-]

Richard: I'll stick with your original example. In your hypothetical, I gather, programmers build a seed AI (a not-yet-superintelligent AGI that will recursively self-modify to become superintelligent after many stages) that includes, among other things, a large block of code I'll call X.

The programmers think of this block of code as an algorithm that will make the seed AI and its descendents maximize human pleasure. But they don't actually know for sure that X will maximize human pleasure — as you note, 'human pleasure' is an unbelievably complex concept, so no human could be expected to actually code it into a machine without making any mistakes. And writing 'this algorithm is supposed to maximize human pleasure' into the source code as a comment is not going to change that. (See the first few paragraphs of Truly Part of You.)

Now, why exactly should we expect the superintelligence that grows out of the seed to value what we really mean by 'pleasure', when all we programmed it to do was X, our probably-failed attempt at summarizing our values? We didn't program it to rewrite its source code to better approximate our True Intentions, or the True Meaning of our in-code comments. And if we did attempt to code it to make either of those self-modifications, that would just produce a new hugely complex block Y which might fail in its own host of ways, given the enormous complexity of what we really mean by 'True Intentions' and 'True Meaning'. So where exactly is the easy, low-hanging fruit that should make us less worried a superintelligence will (because of mistakes we made in its utility function, not mistakes in its factual understanding of the world) hook us up to dopamine drips? All of this seems crucial to your original point in 'The Fallacy of Dumb Superintelligence':

This is what a New Yorker article has to say on the subject of “Moral Machines”: “An all-powerful computer that was programmed to maximize human pleasure, for example, might consign us all to an intravenous dopamine drip.”

What they are trying to say is that a future superintelligent machine might have good intentions, because it would want to make people happy, but through some perverted twist of logic it might decide that the best way to do this would be to force (not allow, notice, but force!) all humans to get their brains connected to a dopamine drip.

It seems to me that you've already gone astray in the second paragraph. On any charitable reading (see the New Yorker article), it should be clear that what's being discussed is the gap between the programmer's intended code and the actual code (and therefore actual behaviors) of the AGI. The gap isn't between the AGI's intended behavior and the set of things it's smart enough to figure out how to do. (Nowhere does the article discuss how hard it is for AIs to do things they desire to. Over and over again is the difficulty of programming AIs to do what we want them to discussed — e.g., Asimov's Three Laws.)

So all the points I make above seem very relevant to your 'Fallacy of Dumb Superintelligence', as originally presented. If you were mixing those two gaps up, though, that might help explain why you spent so much time accusing SIAI/MIRI of making this mistake, even though it's the former gap and not the latter that SIAI/MIRI advocates appeal to.

Maybe it would help if you provided examples of someone actually committing this fallacy, and explained why you think those are examples of the error you mentioned and not of the reasonable fact/value gap I've sketched out here?