TheAncientGeek comments on Debunking Fallacies in the Theory of AI Motivation - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (343)
I am finding this comment thread frustrating, and so expect this will be my last reply. But I'll try to make the most of that by trying to write a concise and clear summary:
Loosemore, Yudkowsky, and myself are all discussing AIs that have a goal misaligned with human values that they nevertheless find motivating. (That's why we call it a goal!) Loosemore observes that if these AIs understand concepts and nuance, they will realize that a misalignment between their goal and human values is possible--if they don't realize that, he doesn't think they deserve the description "superintelligent."
Now there are several points to discuss:
Whether or not "superintelligent" is a meaningful term in this context. I think rationalist taboo is a great discussion tool, and so looked for nearby words that would more cleanly separate the ideas under discussion. I think if you say that such designs are not superwise, everyone agrees, and now you can discuss the meat of whether or not it's possible (or expected) to design superclever but not superwise systems.
Whether we should expect generic AI designs to recognize misalignments, or whether such a realization would impact the goal the AI pursues. Neither Yudkowsky nor I think either of those are reasonable to expect--as a motivating example, we are happy to subvert the goals that we infer evolution was directing us towards in order to better satisfy "our" goals. I suspect that Loosemore thinks that viable designs would recognize it, but agrees that in general that recognition does not have to lead to an alignment.
Whether or not such AIs are likely to be made. Loosemore appears pessimistic about the viability of these undesirable AIs and sees cleverness and wisdom as closely tied together. Yudkowsky appears "optimistic" about their viability, thinking that this is the default outcome without special attention paid to goal alignment. It does not seem to me that cleverness, wisdom, or human-alignment are closely tied together, and so it seems easy to imagine a system with only one of those, by straightforward extrapolation from current use of software in human endeavors.
I don't see any disagreement that AIs pursue their goals, which is the claim you thought needed explanation. What I see is disagreement over whether or not the AI can 'partially solve' the problem of understanding goals and pursuing them. We could imagine a Maverick Nanny that hears "make humans happy," comes up with the plan to wirehead all humans, and then rewrites its sensory code to hallucinate as many wireheaded humans as it can (or just tries to stick as large a number as it can into its memory), rather than actually going to all the trouble of actually wireheading all humans. We can also imagine a Nanny that hears "make humans happy" and actually goes about making humans happy. If the same software underpins both understanding human values and executing plans, what risk is there? But if it's different software, then we have the risk.
I have read what you wrote above carefully, but I won't reply line-by-line because I think it will be clearer not to.
When it comes to finding a concise summary of my claims, I think we do indeed need to be careful to avoid blanket terms like "superintelligent" or superclever" or "superwise" ... but we should only avoid these IF they are used with the implication they have a precise (perhaps technically precise) meaning. I do not believe they have precise meaning. But I do use the term "superintelligent" a lot anyway. My reason for doing that is because I only use it as an overview word -- it is just supposed to be a loose category that includes a bunch of more specific issues. I only really want to convey the particular issues -- the particular ways in which the intelligence of the AI might be less than adequate, for example.
That is only important if we find ourselves debating whether it might clever, wise, or intelligent ..... I wouldn't want to get dragged into that, because I only really care about specifics.
For example: does the AI make a habit of forming plans that massively violate all of its background knowledge about the goal that drove the plan? If it did, it would (1) take the baby out to the compost heap when what it intended to do was respond to the postal-chess game it is engaged in, or (2) cook the eggs by going out to the workshop and making a cross-cutting jog for the table saw, or (3) ......... and so on. If we decided that the AI was indeed prone to errors like that, I wouldn't mind if someone diagnosed a lack of 'intelligence' or a lack of 'wisdom' or a lack of ... whatever. I merely claim that in that circumstance we have evidence that the AI hasn't got what it takes to impose its will on a paper bag, never mind exterminate humanity.
Now, my attacks on the scenarios have to do with a bunch of implications for what the AI (the hypothetical AI) would actually do. And it is that 'bunch' that I think add up to evidence for what I would summarize as 'dumbness'.
And, in fact, I usually go further than that and say that if someone tried to get near to an AI design like that, the problems would arise early on and the AI itself (inasmuch as it could do anyhting smart at all) would be involved in the efforts to suggest improvements. This is where we get the suggestions in your item 2, about the AI 'recognizing' misalignments.
I suspect that on this score a new paper is required, to carefully examine the whole issue in more depth. In fact, a book.
I am now decided that that has to happen.
So perhaps it is best to put the discussion on hold until a seriously detailed technical book comes out of me? At any rate, that is my plan.
That seems like a solid approach. I do suggest that you try to look deeply into whether or not it's possible to partially solve the problem of understanding goals, as I put it above, and make that description of why that is or isn't possible or likely long and detailed. As you point out, that likely requires book-length attention.