Part of the Muehlhauser interview series on AGI.
Luke Muehlhauser is Executive Director of the Singularity Institute, a non-profit research institute studying AGI safety.
Pei Wang is an AGI researcher at Temple University, and Chief Executive Editor of Journal of Artificial General Intelligence.
Luke Muehlhauser
[Apr. 7, 2012]
Pei, I'm glad you agreed to discuss artificial general intelligence (AGI) with me. I hope our dialogue will be informative to many readers, and to us!
On what do we agree? Ben Goertzel and I agreed on the statements below (well, I cleaned up the wording a bit for our conversation):
- Involuntary death is bad, and can be avoided with the right technology.
- Humans can be enhanced by merging with technology.
- Humans are on a risky course in general, because powerful technologies can destroy us, humans are often stupid, and we are unlikely to voluntarily halt technological progress.
- AGI is likely this century.
- AGI will greatly transform the world. It is a potential existential risk, but could also be the best thing that ever happens to us if we do it right.
- Careful effort will be required to ensure that AGI results in good things rather than bad things for humanity.
You stated in private communication that you agree with these statements, depending on what is meant by "AGI." So, I'll ask: What do you mean by "AGI"?
I'd also be curious to learn what you think about AGI safety. If you agree that AGI is an existential risk that will arrive this century, and if you value humanity, one might expect you to think it's very important that we accelerate AI safety research and decelerate AI capabilities research so that we develop safe superhuman AGI first, rather than arbitrary superhuman AGI. (This is what Anna Salamon and I recommend in Intelligence Explosion: Evidence and Import.) What are your thoughts on the matter?
Pei Wang:
[Apr. 8, 2012]
By “AGI” I mean computer systems that follow roughly the same principles as the human mind. Concretely, to me “intelligence” is the ability to adapt to the environment under insufficient knowledge and resources, or to follow the “Laws of Thought” that realize a relative rationality that allows the system to apply its available knowledge and resources as much as possible. See [1, 2] for detailed descriptions and comparisons to other definitions of intelligence.
Such a computer system will share many properties with the human mind; however, it will not have exactly the same behaviors or problem-solving capabilities of a typical human being, since as an adaptive system, the behaviors and capabilities of an AGI not only depend on its built-in principles and mechanisms, but also its body, initial motivation, and individual experience, which are not necessarily human-like.
Like all major breakthroughs in science and technology, the creation of AGI will be both a challenge and an opportunity to the human kind. Like scientists and engineers in all fields, we AGI researchers should use our best judgments to ensure that AGI results in good things rather than bad things for humanity.
Even so, the suggestion to “accelerate AI safety research and decelerate AI capabilities research so that we develop safe superhuman AGI first, rather than arbitrary superhuman AGI” is wrong, for the following major reasons:
- It is based on a highly speculative understanding about what kind of “AGI” will be created. The definition of intelligence in Intelligence Explosion: Evidence and Import is not shared by most AGI researchers. According to my opinion, that kind of “AGI” will never be built.
- Even if the above definition is only considered as a possibility among the other versions of AGI, it will be the actual AI research that will tell us which possibility will become reality. To ban a scientific research according to imaginary risks damages humanity no less than risky research.
- If intelligence turns out to be adaptive (as believed by me and many others), then a “friendly AI” will be mainly the result of proper education, not proper design. There will be no way to design a “safe AI”, just like there is no way to require parents to only give birth to “safe baby” who will never become a criminal.
- The “friendly AI” approach advocated by Eliezer Yudkowsky has several serious conceptual and theoretical problems, and is not accepted by most AGI researchers. The AGI community has ignored it, not because it is indisputable, but because people have not bothered to criticize it.
In summary, though the safety of AGI is indeed an important issue, currently we don’t know enough about the subject to make any sure conclusion. Higher safety can only be achieved by more research on all related topics, rather than by pursuing approaches that have no solid scientific foundation. I hope your Institute to make constructive contribution to the field by studying a wider range of AGI projects, rather than to generalize from a few, or to commit to a conclusion without considering counter arguments.
- [1] Pei Wang, What Do You Mean by "AI"? Proceedings of AGI-08, Pages 362-373, 2008
- [2] Pei Wang, The Assumptions on Knowledge and Resources in Models of Rationality, International Journal of Machine Consciousness, Vol.3, No.1, Pages 193-218, 2011
Luke:
[Apr. 8, 2012]
I appreciate the clarity of your writing, Pei. “The Assumptions of Knowledge and Resources in Models of Rationality” belongs to a set of papers that make up half of my argument for why the only people allowed to do philosophy should be those with with primary training in cognitive science, computer science, or mathematics. (The other half of that argument is made by examining most of the philosophy papers written by those without primary training in cognitive science, computer science, or mathematics.)
You write that my recommendation to “accelerate AI safety research and decelerate AI capabilities research so that we develop safe superhuman AGI first, rather than arbitrary superhuman AGI” is wrong for four reasons, which I will respond to in turn:
- “It is based on a highly speculative understanding about what kind of ‘AGI’ will be created.” Actually, it seems to me that my notion of AGI is broader than yours. I think we can use your preferred definition and get the same result. (More on this below.)
- “…it will be the actual AI research that will tell us which possibility will become reality. To ban a scientific research according to imaginary risks damages humanity no less than risky research.” Yes, of course. But we argue (very briefly) that a very broad range of artificial agents with a roughly human-level capacity for adaptation (under AIKR) will manifest convergent instrumental goals. The fuller argument for this is made in Nick’s Bostrom’s “The Superintelligent Will.”
- “…a ‘friendly AI’ will be mainly the result of proper education, not proper design. There will be no way to design a ‘safe AI’, just like there is no way to require parents to only give birth to ‘safe baby’ who will never become a criminal.” Without being more specific, I can’t tell if we actually disagree on this point. The most promising approach (that I know of) for Friendly AI is one that learns human values and then “extrapolates” them so that the AI optimizes for what we would value if we knew more, were more the people we wish we were, etc. instead of optimizing for our present, relatively ignorant values. (See “The Singularity and Machine Ethics.”)
- “The ‘friendly AI’ approach advocated by Eliezer Yudkowsky has several serious conceptual and theoretical problems.”
I agree. Friendly AI may be incoherent and impossible. In fact, it looks impossible right now. But that’s often how problems look right before we make a few key insights that make things clearer, and show us (e.g.) how we were asking a wrong question in the first place. The reason I advocate Friendly AI research (among other things) is because it may be the only way to secure a desirable future for humanity, (see “Complex Value Systems are Required to Realize Valuable Futures.”) even if it looks impossible. That is why Yudkowsky once proclaimed: “Shut Up and Do the Impossible!” When we don’t know how to make progress on a difficult problem, sometimes we need to hack away at the edges.
I certainly agree that “currently we don’t know enough about [AGI safety] to make any sure conclusion.” That is why more research is needed.
As for your suggestion that “Higher safety can only be achieved by more research on all related topics,” I wonder if you think that is true of all subjects, or only in AGI. For example, should mankind vigorously pursue research on how to make Ron Fouchier's alteration of the H5N1 bird flu virus even more dangerous and deadly to humans, because “higher safety can only be achieved by more research on all related topics”? (I’m not trying to broadly compare AGI capabilities research to supervirus research; I’m just trying to understand the nature of your rejection of my recommendation for mankind to decelerate AGI capabilities research and accelerate AGI safety research.)
Hopefully I have clarified my own positions and my reasons for them. I look forward to your reply!
Pei:
[Apr. 10, 2012]
Luke: I’m glad to see the agreements, and will only comment on the disagreements.
- “my notion of AGI is broader than yours” In scientific theories, broader notions are not always better. In this context, a broad notion may cover too many diverse approaches to provide any non-trivial conclusion. For example, AIXI and NARS are fundamentally different in many aspects, and NARS do not approximate AIXI. It is OK to call both “AGI” with respect to their similar ambitions, but theoretical or technical descriptions based on such a broad notion are hard to make. Almost all of your descriptions about AIXI are hardly relevant to NARS, as well as to most existing “AGI” projects, for this reason.
- “I think we can use your preferred definition and get the same result.” No you cannot. According to my definition, AIXI is not intelligent, since it doesn’t obey AIKR. Since most of your conclusions are about that type of system, they will go with it.
- “a very broad range of artificial agents with a roughly human-level capacity for adaptation (under AIKR) will manifest convergent instrumental goals” I cannot access Bostrom’s paper, but guess that he made additional assumptions. In general, the goal structure of an adaptive system changes according to the system’s experience, so unless you restrict the experience of these artificial agents, there is no way to restrict their goals. I agree that to make AGI safe, to control their experience will probably be the main approach (which is what “education” is all about), but even that cannot guarantee safety. (see below)
- “The Singularity and Machine Ethics.” I don’t have the time to do a detailed review, but can frankly tell you why I disagree with the main suggestion “to program the AI’s goal system to want what we want before the AI self-improves beyond our capacity to control it”.
- As I mentioned above, the goal system of an adaptive system evolves as a function of the system’s experience. No matter what initial goals are implanted, under AIKR the derived goals are not necessarily their logical implications, which is not necessarily a bad thing (the humanity is not a logical implication of the human biological nature, neither), though it means the designer has no full control to it (unless the designer also fully controls the experience of the system, which is practically impossible). See “The self-organization of goals” for detailed discussion.
- Even if the system’s goal system can be made to fully agree with certain given specifications, I wonder where these specifications come from --- we human beings are not well known for reaching consensus on almost anything, not to mention on a topic this big.
- Even if the we could agree on the goals of AI’s, and find a way to enforce them in AI’s, that still doesn’t means we have “friendly AI”. Under AIKR, a system can cause damage simply because of its ignorance in a novel situation.
For these reasons, under AIKR we cannot have AI with guaranteed safety or friendliness, though we can and should always do our best to make them safer, based on our best judgment (which can still be wrong, due to AIKR). To apply logic or probability theory into the design won’t change the big picture, because what we are after are empirical conclusions, not theorems within those theories. Only the latter can have proved correctness, and the former cannot (though they can have strong evidential support).
“I’m just trying to understand the nature of your rejection of my recommendation for mankind to decelerate AGI capabilities research and accelerate AGI safety research”
Frankly, I don’t think anyone currently has the evidence or argument to ask the others to decelerate their research for safety consideration, though it is perfectly fine to promote your own research direction and try to attract more people into it. However, unless you get a right idea about what AGI is and how it can be built, it is very unlikely for you to know how to make it safe.
Luke:
[Apr. 10, 2012]
I didn’t mean to imply that my notion of AGI was “better” because it is broader. I was merely responding to your claim that my argument for differential technological development (in this case, decelerating AI capabilities research while accelerating AI safety research) depends on a narrow notion of AGI that you believe “will never be built.” But this isn’t true, because my notion of AGI is very broad and includes your notion of AGI as a special case. My notion of AGI includes both AIXI-like “intelligent” systems and also “intelligent” systems which obey AIKR, because both kinds of systems (if implemented/approximated successfully) could efficiently use resources to achieve goals, and that is the definition Anna and I stipulated for “intelligence.”
Let me back up. In our paper, Anna and I stipulate that for the purposes of our paper we use “intelligence” to mean an agent’s capacity to efficiently use resources (such as money or computing power) to optimize the world according to its preferences. You could call this “instrumental rationality” or “ability to achieve one’s goals” or something else if you prefer; I don’t wish to encourage a “merely verbal” dispute between us. We also specify that by “AI” (in our discussion, “AGI”) we mean “systems which match or exceed the intelligence [as we just defined it] of humans in virtually all domains of interest.” That is: by “AGI” we mean “systems which match or exceed the human capacity for efficiently using resources to achieve goals in virtually all domains of interest.” So I’m not sure I understood you correctly: Did you really mean to say that “kind of AGI will never be built”? If so, why do you think that? Is the human very close to a natural ceiling on an agent’s ability to achieve goals?
What we argue in “Intelligence Explosion: Evidence and Import,” then, is that a very broad range of AGIs pose a threat to humanity, and therefore we should be sure we have the safety part figured out as much as we can before we figure out how to build AGIs. But this is the opposite of what is happening now. Right now, almost all AGI-directed R&D resources are being devoted to AGI capabilities research rather than AGI safety research. This is the case even though there is AGI safety research that will plausibly be useful given almost any final AGI architecture, for example the problem of extracting coherent preferences from humans (so that we can figure out which rules / constraints / goals we might want to use to bound an AGI’s behavior).
I do hope you have the chance to read “The Superintelligent Will.” It is linked near the top of nickbostrom.com and I will send it to you via email.
But perhaps I have been driving the direction of our conversation too much. Don’t hesitate it to steer it towards topics you would prefer to address!
Pei:
[Apr. 12, 2012]
Hi Luke,
I don’t expect to resolve all the related issues in such a dialogue. In the following, I’ll return to what I think as the major issues and summarize my position.
- Whether we can build a “safe AGI” by giving it a carefully designed “goal system” My answer is negative. It is my belief that an AGI will necessarily be adaptive, which implies that the goals it actively pursues constantly change as a function of its experience, and are not fully restricted by its initial (given) goals. As described in my eBook (cited previously), the goal derivation is based on the system’s beliefs, which may lead to conflicts in goals. Furthermore, even if the goals are fixed, they cannot fully determine the consequences of the system’s behaviors, which also depend on the system’s available knowledge and resources, etc. If all those factors are also fixed, then we may get guaranteed safety, but the system won’t be intelligent --- it will be just like today’s ordinary (unintelligent) computer.
- Whether we should figure out how to build “safe AGI” before figuring out how to build “AGI”. My answer is negative, too. As in all adaptive systems, the behaviors of an intelligent system are determined both by its nature (design) and nurture (experience). The system’s intelligence mainly comes from its design, and is “morally neutral”, in the sense that (1) any goals can be implanted initially, (2) very different goals can be derived from the same initial design and goals, given different experience. Therefore, to control the morality of an AI mainly means to educate it properly (i.e., to control its experience, especially in its early years). Of course, the initial goals matters, but it is wrong to assume that the initial goals will always be the dominating goals in decision making processes. To develop a non-trivial education theory of AGI requires a good understanding about how the system works, so if we don’t know how to build an AGI, there is no chance for us to know how to make it safe. I don’t think a good education theory can be “proved” in advance, pure theoretically. Rather, we’ll learn most of it by interacting with baby AGIs, just like how many of us learn how to educate children.
Such a short position statement may not convince you, but I hope you can consider it at least as a possibility. I guess the final consensus can only come from further research.
Luke:
[Apr. 19, 2012]
Pei,
I agree that an AGI will be adaptive in the sense that its instrumental goals will adapt as a function of its experience. But I do think advanced AGIs will have convergently instrumental reasons to preserve their final (or “terminal”) goals. As Bostrom explains in “The Superintelligent Will”:
An agent is more likely to act in the future to maximize the realization of its present final goals if it still has those goals in the future. This gives the agent a present instrumental reason to prevent alterations of its final goals.
I also agree that even if an AGI’s final goals are fixed, the AGI’s behavior will also depend on its knowledge and resources, and therefore we can’t exactly predict its behavior. But if a system has lots of knowledge and resources, and we know its final goals, then we can predict with some confidence that whatever it does next, it will be something aimed at achieving those final goals. And the more knowledge and resources it has, the more confident we can be that its actions will successfully aim at achieving its final goals. So if a superintelligent machine’s only final goal is to play through Super Mario Bros within 30 minutes, we can be pretty confident it will do so. The problem is that we don’t know how to tell a superintelligent machine to do things we want, so we’re going to get many unintended consequences for humanity (as argued in “The Singularity and Machine Ethics”).
You also said that you can’t see what safety work there is to be done without having intelligent systems (e.g. “baby AGIs”) to work with. I provided a list of open problems in AI safety here, and most of them don’t require that we know how to build an AGI first. For example, one reason we can’t tell an AGI to do what humans want is that we don’t know what humans want, and there is work to be done in philosophy and in preference acquisition in AI in order to get clearer about what humans want.
Pei:
[Apr. 20, 2012]
Luke,
I think we have made our different beliefs clear, so this dialogue has achieved its goal. It won’t be an efficient usage of our time to attempt to convince each other at this moment, and each side can analyze these beliefs in proper forms of publication at a future time.
Now we can let the readers consider these arguments and conclusions.
Luke, what do you mean here when you say, "Friendly AI may be incoherent and impossible"?
The Singularity Institute's page "What is Friendly AI?" defines "Friendly AI" as "A "Friendly AI" is an AI that takes actions that are, on the whole, beneficial to humans and humanity." Surely you don't mean to say, "The idea of an AI that takes actions that are, on the whole, beneficial to humans and humanity may be incoherent or impossible"?
Eliezer's paper "Artificial Intelligence as a Positive and Negative Factor in Global Risk" talks about "an AI created with specified motivations." But it's pretty clear that that's not the only thing you and he have in mind, because part of the problem is making sure the motivations we give an AI are the ones we really want to give it.
If you meant neither of those things, what did you mean? "Provably friendly"? "One whose motivations express an ideal extrapolation of our values"? (It seems a flawed extrapolation could still give results that are on the whole beneficial, so this is different than the first definition suggested above.) Or something else?