I agree. Friendly AI may be incoherent and impossible. In fact, it looks impossible right now. But that’s often how problems look right before we make a few key insights that make things clearer, and show us (e.g.) how we were asking a wrong question in the first place. The reason I advocate Friendly AI research (among other things) is because it may be the only way to secure a desirable future for humanity, (see “Complex Value Systems are Required to Realize Valuable Futures.”) even if it looks impossible. That is why Yudkowsky once proclaimed: “Shut Up and Do the Impossible!” When we don’t know how to make progress on a difficult problem, sometimes we need to hack away at the edges.
Just a suggestion for future dialogs: The amount of Less Wrong jargon, links to Less Wrong posts explaining that jargon, and the Yudkowsky "proclamation" in this paragraph is all a bit squicky, alienating and potentially condescending. And I think they muddle the point you're making.
Anyway, biting Pei's bullet for a moment, if building an AI isn't safe, if it's, like Pei thinks, similar to educating a child (except, presumably, with a few orders of magnitude more uncertainty about the outc...
Just a suggestion for future dialogs: The amount of Less Wrong jargon, links to Less Wrong posts explaining that jargon, and the Yudkowsky "proclamation" in this paragraph is all a bit squicky, alienating and potentially condescending.
Seconded; that bit - especially the "Yudkowky proclaimed" - stuck out for me.
Those phrases would be fine with me if they weren't hyperlinked to Less Wrong posts. They're not LW-specific notions so there shouldn't be a reason to link a Artificial Intelligence professor to blog posts discussing them. Anyway, I'm just expressing my reaction to the paragraph. You can take it or leave it.
If you clear away all the noise arising from the fact that this interaction constitutes a clash of tribal factions (here comes Young Upstart Outsider trying to argue that Established Academic Researcher is really a Mad Scientist), you can actually find at least one substantial (implicit) claim by Wang that is worth serious consideration from SI's point of view. And that is that building FAI may require (some) empirical testing prior to "launch". It may not be enough to simply try to figure everything out on paper beforehand, and then wincingly press the red button with the usual "here goes nothing!" It may instead be necessary to build toy models (that can hopefully be controlled, obviously) and see how they work, to gain information about the behavior of (aspects of) the code.
Similarly, in the Goertzel dialogue, I would have argued (and meant to argue) that Goertzel's "real point" was that EY/SI overestimate the badness of (mere) 95% success; that the target, while somewhat narrow, isn't as narrow as the SI folks claim. This is also worth serious consideration, since one can imagine a situation where Goertzel (say) is a month away from launching his 80%-Friendly AI, while EY believes that his ready-to-go 95%-Friendly design can be improved to 97% within one month and 100% within two...what should EY do, and what will he do based on his current beliefs?
If you clear away all the noise arising from the fact that this interaction constitutes a clash of tribal factions (here comes Young Upstart Outsider trying to argue that Established Academic Researcher is really a Mad Scientist), you can actually find at least one substantial (implicit) claim by Wang that is worth serious consideration from SI's point of view. And that is that building FAI may require (some) empirical testing prior to "launch".
Testing is common practice. Surely no competent programmer would ever advocate deploying a complex program without testing it.
I'm glad to see the large number amount of sincere discussion here, and thanks to Luke and Pei for doing this.
Although most people are not guilty of this, I would like to personally plea that people keep references towards Pei civil; insulting him or belittling his ideas without taking the time to genuinely respond to them (linking to a sequence post doesn't count) will make future people less likely to want to hold such discussions, which will be bad for the community, whichever side of the argument you are on.
My small non-meta contribution to this thread: I suspect that some of Pei's statements that seem wrong at face value are a result of him lacking the language to state things in a way that would satisfy most LWers. Can someone try to charitably translate his arguments into such terms? In particular, his stuff about goal adaptation are somewhat similar in spirit to jtaylor's recent posts on learning utility functions.
First, thank you for publishing this illuminating exchange.
I must say that Pei Wang sounds way more convincing to an uninitiated, but curious and mildly intelligent lay person (that would be me). Does not mean he is right, but he sure does make sense.
When Luke goes on to make a point, I often get lost in a jargon ("manifest convergent instrumental goals") or have to look up a paper that Pei (or other AGI researchers) does not hold in high regard. When Pei Wang makes an argument, it is intuitively clear and does not require going through a complex chain of reasoning outlined in the works of one Eliezer Yudkowsky and not vetted by the AI community at large. This is, of course, not a guarantee of its validity, but it sure is easier to follow.
Some of the statements are quite damning, actually: "The “friendly AI” approach advocated by Eliezer Yudkowsky has several serious conceptual and theoretical problems, and is not accepted by most AGI researchers. The AGI community has ignored it, not because it is indisputable, but because people have not bothered to criticize it." If one were to replace AI with physics, I would tend to dismiss EY as a crank just based on this...
What makes me trust Pei Wang more than Luke is the common-sense statements like "to make AGI safe, to control their experience will probably be the main approach (which is what “education” is all about), but even that cannot guarantee safety."...
This sort of "common sense" can be highly misleading! For example, here Wang is drawing parallels between a nascent AI and a human child to argue about nature vs nurture. But if we compare a human and a different social animal, we'll see that most of the differences in their behavior are innate and the gap can't be covered by any amount of "education": e.g. humans can't really become as altruistic and self-sacrificing as worker ants because they'll still retain some self-preservation instinct, no matter how you brainwash them.
What makes Wang think that this sort of fixed attitude - which can be made more hard-wired than the instincts of biological organisms - cannot manifest itself in an AGI?
(I'm certain that a serious AI thinker, or just someone with good logic and clear thinking, could find a lot more holes in such "common sense" talk.)
As a side point, I cannot help but wonder if the outcome of this discussion would have been different were it EY and not LM involved in it.
I expect Eliezer to have displayed less patience than Luke did (a more or less generalizable prediction.)
When Pei Wang makes an argument, it is intuitively clear and does not require going through a complex chain of reasoning […]
I felt the main reason was anthropomorphism:
Such a computer system will share many properties with the human mind; […]
If intelligence turns out to be adaptive (as believed by me and many others), then a “friendly AI” will be mainly the result of proper education, not proper design.
Note that I don't want to accuse Pei Wang of anthropomorphism. My point is, his choice of words appeal to our anthropomorphism, which is highly intuitive. Another example of an highly intuitive, but not very helpful sentence:
It is my belief that an AGI will necessarily be adaptive, which implies that the goals it actively pursues constantly change as a function of its experience, and are not fully restricted by its initial (given) goals.
Intuitive, because applied to humans, we can easily see that we can change plans according to experience. Like apply for a PhD, then dropping out when finding out you don't enjoy it after all. You can abandon the goal of making research, and have a new goal of, say, practicing and teaching surfing.
Not very helpful, because the split betwee...
I wish Pei had taken the time to read the articles I repeatedly linked to, for they were written precisely to explain why his position is misguided.
I think you should have listed a couple of the most important articles at the beginning as necessary background reading to understand your positions and terminology (like Pei did with his papers), and then only used links very sparingly afterwards. Unless you already know your conversation partner takes you very seriously, you can't put 5 hyperlinks in an email and expect the other person to read them all. When they see that many links, they'll probably just ignore all of them. (Not to mention the signaling issues that others already pointed out.)
I was dismayed that Pei has such a poor opinion of the Singularity Institute's arguments, and that he thinks we are not making a constructive contribution. If we want the support of the AGI community, it seems we'll have to improve our communication.
It might be more worthwhile to try to persuade graduate students and undergraduates who might be considering careers in AI research, since the personal cost associated with deciding that AI research is dangerous is lower for them. So less motivated cognition.
I wouldn't be as worried if they took it upon themselves to study AI risk independently, but rather than "not listen to Eliezer", the actual event seems to be "not pay attention to AI risks" as a whole.
I wouldn't be as worried if they took it upon themselves to study AI risk independently, but rather than "not listen to Eliezer", the actual event seems to be "not pay attention to AI risks" as a whole.
Think about it this way. There are a handful of people like Jürgen Schmidhuber who share SI's conception of AGI and its potential. But most AI researchers, including Pei Wang, do not buy the idea of AGI's that can quickly and vastly self-improve themselves to the point of getting out of control.
Telling most people in the AI community about AI risks is similar to telling neuroscientists that their work might lead to the creation of a society of uploads which will copy themselves millions of times and pose a risk due to the possibility of a value drift. What reaction do you anticipate?
Telling most people in the AI community about AI risks is similar to telling neuroscientists that their work might lead to the creation of a society of uploads which will copy themselves millions of times and pose a risk due to the possibility of a value drift. What reaction do you anticipate?
One neuroscientist thought about it for a while, then said "yes, you're probably right". Then he co-authored with me a paper touching upon that topic. :-)
(Okay, probably not a very typical case.)
The “friendly AI” approach advocated by Eliezer Yudkowsky has several serious conceptual and theoretical problems, and is not accepted by most AGI researchers. The AGI community has ignored it, not because it is indisputable, but because people have not bothered to criticize it.
AI: A Modern Approach seems to take the matter seriously.
I don't think Yudkowsky has been ignored through lack of criticism. It's more that he heads a rival project that doesn't seem too interested in collaboration with other teams, and instead spits out negative PR about them - e.g.:
And if Novamente should ever cross the finish line, we all die.
To those who disagree with Pei Wang: How would you improve his arguments? What assumptions would make his thesis correct?
If I understand correctly, his theses are that the normal research path will produce safe AI because it won't blow up out of our control or generally behave like a Yudkowsky/Bostrom-style AI. Also that trying to prove friendlyness in advance is futile (and yet AI is still a good idea) because it will have to have "adaptive" goals, which for some reason has to extend to terminal goals.
It is my belief that an AGI will necessarily be adaptive, which implies that the goals it actively pursues constantly change as a function of its experience, and are not fully restricted by its initial (given) goals.
He needs to taboo "adapive", read and understand Bostroms AI-behaviour stuff, and comprehend the Superpowerful-Optimizer view, and then explain exactly why it is that an AI cannot have a fixed goal architecture.
If AI's can't have a fixed goal architecture, Wang needs to show that AI's with unpredictable goals are somehow safe, or start speaking out against AI.
So what sort of inconvienient word would it take for Wang's major conclusions to be correct?
I don't know, I'm not good enough at this steel-man thing, and my wife is sending me to bed.
It amuses me to think of Eliezer and Pei as like yang and yin. Eliezer has a very yang notion of AI: sharp goals, optimizing, conquering the world in a hurry. Pei's AI doesn't just have limitations - bounded rationality - its very nature is about working with those limitations. And yet, just like the symbol, yin contains yang and yang contains yin: Pei is the one who is forging ahead with a practical AI project, whereas Eliezer is a moral philosopher looking for a mathematical ideal.
As I said about a previous discussion with Ben Goertzel, they seem to agree about the dangers, but not about how much the Singularity Institute might affect the outcome.
To rephrase the primary disagreement: "Yes, AIs are incredibly, world-threateningly dangerous, but there's nothing you can do about it."
This seems based around limited views of what sort of AI minds are possible or likely, such as an anthropomorphized baby which can be taught and studied similar to human children.
The AGI community has ignored it, not because it is indisputable, but because people have not bothered to criticize it.
Here's Bill Hibbard criticizing FAI, and Ben Goertzel doing the same, and Shane Legg. Surely Pei would consider all of these people to be part of the AGI community? Perhaps Pei means that most of the AGI community has not bothered to criticize FAI, but then most of the AGI community has not bothered to criticize any particular AGI proposal, including his own NARS.
Does anyone see any other interpretation, besides that Pei is just mistake...
Luke, I'm wondering, when you wrote your replies to Pei Wang, did you try to model his potential answer? If yes, how close were you? If not, why not?
Great exchange! Very clear and civilized, I thought.
Wang seems to be hung up on this "adaptive" idea and is anthropomorphising the AI to be like humans (ignorant of changable values). It will be interesting to see if he changes his mind as he reads Bostrom's stuff.
EDIT: in case it's not clear, I think Wang is missing a big piece of the puzzle (being that AI's are optimizers (Yudkowsky), and optimizers will behave in certain dangerous ways (Bostrom))
For example, should mankind vigorously pursue research on how to make Ron Fouchier's alteration of the H5N1 bird flu virus even more dangerous and deadly to humans, because “higher safety can only be achieved by more research on all related topics”?
Yeah, I remember reading this argument and thinking how it does not hold water. The flu virus is a well-research area. It may yet hold some surprises, sure, but we think that we know quite a bit about it. We know enough to tell what is dangerous and what is not. AGI research is nowhere near this stage. My comparison would be someone screaming at Dmitri Ivanovsky in 1892 "do not research viruses until you know that this research is safe!".
My answer is that much of the research in this outline of open problems doesn't require us to know which AGI architecture will succeed first, for example the problem of representing human values coherently.
Do other AI researchers agree with your list of open problems worth researching? If you asked Dr. Wang about it, what was his reaction?
"Natural AI" is an oxymoron. There are lots of NIs (natural intelligences) scampering around killing millions of people.
And we're only a little over a hundred years into virus research, much less on intelligence. Give it another hundred.
Luke, what do you mean here when you say, "Friendly AI may be incoherent and impossible"?
The Singularity Institute's page "What is Friendly AI?" defines "Friendly AI" as "A "Friendly AI" is an AI that takes actions that are, on the whole, beneficial to humans and humanity." Surely you don't mean to say, "The idea of an AI that takes actions that are, on the whole, beneficial to humans and humanity may be incoherent or impossible"?
Eliezer's paper "Artificial Intelligence as a Positive and Neg...
If intelligence turns out to be adaptive (as believed by me and many others), then a “friendly AI” will be mainly the result of proper education, not proper design. There will be no way to design a “safe AI”, just like there is no way to require parents to only give birth to “safe baby” who will never become a criminal.
I don't think that follows. What consumer robot makers will want will be the equivalent of a "safe baby" - who will practically never become a criminal. That will require a tamper-proof brain, and many other safety features. ...
Pei seems to conflate the possibility of erroneous beliefs with the possibility of unfortunate (for us) goals. The Assumption of Insufficient Knowledge and Resources isn't what FAI is about, yet you get statements like
...As I mentioned above, the goal system of an adaptive system evolves as a function of the system’s experience. No matter what initial goals are implanted, under AIKR the derived goals are not necessarily their logical implications, which is not necessarily a bad thing (the humanity is not a logical implication of the human biological nature,
I'm finding these dialogues worthwhile for (so far) lowering my respect for "mainstream" AI researchers.
Pei Wang's definition of intelligence is just "optimization process" in fancy clothes.
His emphasis on raising an AI with prim/proper experience makes me realize how humans can't use our native architecture thinking about AI problems. For so many people, "building a safe AI" just pattern-matches to "raising a child so he becomes a good citizen", even though these tasks have nothing to do with each other. But the ana...
Think about how ridiculous your comment must sound to them.
I have no reason to suspect that other people's use of the absurdity heuristic should cause me to reevaluate every argument I've ever seen.
That a de novo AGI will be nothing like a human child in terms of how to make it safe is an antiprediction in that it would take a tremendous amount of evidence to suggest otherwise, and yet Wang just assumes this without having any evidence at all. I can only conclude that the surface analogy is the entire content of the claim.
That you just assume that they must be stupid
If he were just stupid, I'd have no right to be indignant at his basic mistake. He is clearly an intelligent person.
They have probably thought about everything you know long before you and dismissed it.
You are not making any sense. Think about how ridiculous your comment must sound to me.
(I'm starting to hate that you've become a fixture here.)
This is great. Pei really impresses me, especially with the linked paper, “The Assumptions of Knowledge and Resources in Models of Rationality”. If you haven't read it, please read it. It clarifies everything Pei is saying and allows you to understand his perspective much better.
That said, I think Luke's final rebuttal was spot on, and I would like to find out whether Pei has changed his mind after reading "Superintelligent Will".
Luke remarked:
the only people allowed to do philosophy should be those with with primary training in cognitive science, computer science, or mathematics.
I expect that this is meant as a metaphorical remark about the low value of some philosophy, rather than literally as a call for banning anyone from doing philosophy. However, this sort of comment squicks me, especially in this context.
I think we should expect AGIs to have more stable goal systems that are less affected by their beliefs and environment than humans. Remember that humans are a symbol processing system on top of a behavior learning system on top of an association learning system. And we don't let our beliefs propagate by default, and our brains experience physiological changes as a result of aging. It seems like there would be a lot more room for goal change in such a messy aging architecture.
The least intelligent humans tend not to be very cautious and tend to have poor im...
Sorry if this is a silly question, but what does AIKR refer to? Google has failed me in this regard.
Pei's point that FAI is hard/impossible seems to me to be an argument that we should be even more keen to stop AGI research. He does a good argument that FAI doesn't work, but hasn't got anything to substitute in its place ("educating the AI" might be the only way to go, but that doesn't mean it's a good way to go).
Whether we can build a “safe AGI” by giving it a carefully designed “goal system” My answer is negative. It is my belief that an AGI will necessarily be adaptive, which implies that the goals it actively pursues constantly change as a function of its experience, and are not fully restricted by its initial (given) goals.
I don't see Pei distinguishing between instrumental and ultimate goals anywhere. Whereas Luke does do this. Maybe a failure to make that distinction explains the resulting muddle.
Update - 2012-04-23 - it looks as though he does do somethi...
Eliezer's epiphany about precision, which I completely subscribe to, negates most of Pei's arguments for me.
I guess Pei's intuition is that a proof of uniqueness or optimality under unrealistic assumptions is of little practical value, and doing such proofs under realistic assumptions is unfeasible compared to the approach he is taking.
ETA: When you write most kinds of software, you don't first prove that your design is optimal or unique, but just start with something that you intuitively think would work, and then refine it by trial and error. Why shouldn't this work for AGI?
ETA 2: In case it wasn't clear, I'm not advocating that we build AGIs by trial and error, but just trying to explain what Pei is probably thinking, and why cousin_it's link isn't likely to be convincing for him.
If you clear away all the noise arising from the fact that this interaction constitutes a clash of tribal factions...
Pei seems to conflate the possibility...
I'm finding these dialogues worthwhile for (so far) lowering my respect for "mainstream" AI researchers...
and so on.
I think it'd be great if SIAI would not lath on the most favourable and least informative interpretation of any disagreement, in precisely the way how e.g. any community around free energy devices does. It'd be also great if Luke allowed for the possibility that Wang (and mos...
Better than Goetzel but why didn't you put self-improving AI on the table with the rest of the propositions? That's a fundamental piece of the puzzle in understanding Pei's position. It could be that he thinks a well taught AI is safe enough because it won't self-modify.
If an AGI research group were close to success but did not respect friendly AI principles, should the government shut them down?
I think the most glaring problem I could detect with Pei's position is captured in this quotation:
Therefore, to control the morality of an AI mainly means to educate it properly (i.e., to control its experience, especially in its early years). Of course, the initial goals matters, but it is wrong to assume that the initial goals will always be the dominating goals in decision making processes.
This totally dodge's Luke's point that we don't have a clue what such moral education would be like because we don't understand these things about people. For thi...
Pei remarked:
In scientific theories, broader notions are not always better. In this context, a broad notion may cover too many diverse approaches to provide any non-trivial conclusion.
Sounds like Eliezer's advice to be specific, doesn't it? Or even the virtue of narrowness.
Part of the Muehlhauser interview series on AGI.
Luke Muehlhauser is Executive Director of the Singularity Institute, a non-profit research institute studying AGI safety.
Pei Wang is an AGI researcher at Temple University, and Chief Executive Editor of Journal of Artificial General Intelligence.
Luke Muehlhauser
[Apr. 7, 2012]
Pei, I'm glad you agreed to discuss artificial general intelligence (AGI) with me. I hope our dialogue will be informative to many readers, and to us!
On what do we agree? Ben Goertzel and I agreed on the statements below (well, I cleaned up the wording a bit for our conversation):
You stated in private communication that you agree with these statements, depending on what is meant by "AGI." So, I'll ask: What do you mean by "AGI"?
I'd also be curious to learn what you think about AGI safety. If you agree that AGI is an existential risk that will arrive this century, and if you value humanity, one might expect you to think it's very important that we accelerate AI safety research and decelerate AI capabilities research so that we develop safe superhuman AGI first, rather than arbitrary superhuman AGI. (This is what Anna Salamon and I recommend in Intelligence Explosion: Evidence and Import.) What are your thoughts on the matter?
Pei Wang:
[Apr. 8, 2012]
By “AGI” I mean computer systems that follow roughly the same principles as the human mind. Concretely, to me “intelligence” is the ability to adapt to the environment under insufficient knowledge and resources, or to follow the “Laws of Thought” that realize a relative rationality that allows the system to apply its available knowledge and resources as much as possible. See [1, 2] for detailed descriptions and comparisons to other definitions of intelligence.
Such a computer system will share many properties with the human mind; however, it will not have exactly the same behaviors or problem-solving capabilities of a typical human being, since as an adaptive system, the behaviors and capabilities of an AGI not only depend on its built-in principles and mechanisms, but also its body, initial motivation, and individual experience, which are not necessarily human-like.
Like all major breakthroughs in science and technology, the creation of AGI will be both a challenge and an opportunity to the human kind. Like scientists and engineers in all fields, we AGI researchers should use our best judgments to ensure that AGI results in good things rather than bad things for humanity.
Even so, the suggestion to “accelerate AI safety research and decelerate AI capabilities research so that we develop safe superhuman AGI first, rather than arbitrary superhuman AGI” is wrong, for the following major reasons:
In summary, though the safety of AGI is indeed an important issue, currently we don’t know enough about the subject to make any sure conclusion. Higher safety can only be achieved by more research on all related topics, rather than by pursuing approaches that have no solid scientific foundation. I hope your Institute to make constructive contribution to the field by studying a wider range of AGI projects, rather than to generalize from a few, or to commit to a conclusion without considering counter arguments.
Luke:
[Apr. 8, 2012]
I appreciate the clarity of your writing, Pei. “The Assumptions of Knowledge and Resources in Models of Rationality” belongs to a set of papers that make up half of my argument for why the only people allowed to do philosophy should be those with with primary training in cognitive science, computer science, or mathematics. (The other half of that argument is made by examining most of the philosophy papers written by those without primary training in cognitive science, computer science, or mathematics.)
You write that my recommendation to “accelerate AI safety research and decelerate AI capabilities research so that we develop safe superhuman AGI first, rather than arbitrary superhuman AGI” is wrong for four reasons, which I will respond to in turn:
I agree. Friendly AI may be incoherent and impossible. In fact, it looks impossible right now. But that’s often how problems look right before we make a few key insights that make things clearer, and show us (e.g.) how we were asking a wrong question in the first place. The reason I advocate Friendly AI research (among other things) is because it may be the only way to secure a desirable future for humanity, (see “Complex Value Systems are Required to Realize Valuable Futures.”) even if it looks impossible. That is why Yudkowsky once proclaimed: “Shut Up and Do the Impossible!” When we don’t know how to make progress on a difficult problem, sometimes we need to hack away at the edges.
I certainly agree that “currently we don’t know enough about [AGI safety] to make any sure conclusion.” That is why more research is needed.
As for your suggestion that “Higher safety can only be achieved by more research on all related topics,” I wonder if you think that is true of all subjects, or only in AGI. For example, should mankind vigorously pursue research on how to make Ron Fouchier's alteration of the H5N1 bird flu virus even more dangerous and deadly to humans, because “higher safety can only be achieved by more research on all related topics”? (I’m not trying to broadly compare AGI capabilities research to supervirus research; I’m just trying to understand the nature of your rejection of my recommendation for mankind to decelerate AGI capabilities research and accelerate AGI safety research.)
Hopefully I have clarified my own positions and my reasons for them. I look forward to your reply!
Pei:
[Apr. 10, 2012]
Luke: I’m glad to see the agreements, and will only comment on the disagreements.
For these reasons, under AIKR we cannot have AI with guaranteed safety or friendliness, though we can and should always do our best to make them safer, based on our best judgment (which can still be wrong, due to AIKR). To apply logic or probability theory into the design won’t change the big picture, because what we are after are empirical conclusions, not theorems within those theories. Only the latter can have proved correctness, and the former cannot (though they can have strong evidential support).
“I’m just trying to understand the nature of your rejection of my recommendation for mankind to decelerate AGI capabilities research and accelerate AGI safety research”
Frankly, I don’t think anyone currently has the evidence or argument to ask the others to decelerate their research for safety consideration, though it is perfectly fine to promote your own research direction and try to attract more people into it. However, unless you get a right idea about what AGI is and how it can be built, it is very unlikely for you to know how to make it safe.
Luke:
[Apr. 10, 2012]
I didn’t mean to imply that my notion of AGI was “better” because it is broader. I was merely responding to your claim that my argument for differential technological development (in this case, decelerating AI capabilities research while accelerating AI safety research) depends on a narrow notion of AGI that you believe “will never be built.” But this isn’t true, because my notion of AGI is very broad and includes your notion of AGI as a special case. My notion of AGI includes both AIXI-like “intelligent” systems and also “intelligent” systems which obey AIKR, because both kinds of systems (if implemented/approximated successfully) could efficiently use resources to achieve goals, and that is the definition Anna and I stipulated for “intelligence.”
Let me back up. In our paper, Anna and I stipulate that for the purposes of our paper we use “intelligence” to mean an agent’s capacity to efficiently use resources (such as money or computing power) to optimize the world according to its preferences. You could call this “instrumental rationality” or “ability to achieve one’s goals” or something else if you prefer; I don’t wish to encourage a “merely verbal” dispute between us. We also specify that by “AI” (in our discussion, “AGI”) we mean “systems which match or exceed the intelligence [as we just defined it] of humans in virtually all domains of interest.” That is: by “AGI” we mean “systems which match or exceed the human capacity for efficiently using resources to achieve goals in virtually all domains of interest.” So I’m not sure I understood you correctly: Did you really mean to say that “kind of AGI will never be built”? If so, why do you think that? Is the human very close to a natural ceiling on an agent’s ability to achieve goals?
What we argue in “Intelligence Explosion: Evidence and Import,” then, is that a very broad range of AGIs pose a threat to humanity, and therefore we should be sure we have the safety part figured out as much as we can before we figure out how to build AGIs. But this is the opposite of what is happening now. Right now, almost all AGI-directed R&D resources are being devoted to AGI capabilities research rather than AGI safety research. This is the case even though there is AGI safety research that will plausibly be useful given almost any final AGI architecture, for example the problem of extracting coherent preferences from humans (so that we can figure out which rules / constraints / goals we might want to use to bound an AGI’s behavior).
I do hope you have the chance to read “The Superintelligent Will.” It is linked near the top of nickbostrom.com and I will send it to you via email.
But perhaps I have been driving the direction of our conversation too much. Don’t hesitate it to steer it towards topics you would prefer to address!
Pei:
[Apr. 12, 2012]
Hi Luke,
I don’t expect to resolve all the related issues in such a dialogue. In the following, I’ll return to what I think as the major issues and summarize my position.
Such a short position statement may not convince you, but I hope you can consider it at least as a possibility. I guess the final consensus can only come from further research.
Luke:
[Apr. 19, 2012]
Pei,
I agree that an AGI will be adaptive in the sense that its instrumental goals will adapt as a function of its experience. But I do think advanced AGIs will have convergently instrumental reasons to preserve their final (or “terminal”) goals. As Bostrom explains in “The Superintelligent Will”:
An agent is more likely to act in the future to maximize the realization of its present final goals if it still has those goals in the future. This gives the agent a present instrumental reason to prevent alterations of its final goals.
I also agree that even if an AGI’s final goals are fixed, the AGI’s behavior will also depend on its knowledge and resources, and therefore we can’t exactly predict its behavior. But if a system has lots of knowledge and resources, and we know its final goals, then we can predict with some confidence that whatever it does next, it will be something aimed at achieving those final goals. And the more knowledge and resources it has, the more confident we can be that its actions will successfully aim at achieving its final goals. So if a superintelligent machine’s only final goal is to play through Super Mario Bros within 30 minutes, we can be pretty confident it will do so. The problem is that we don’t know how to tell a superintelligent machine to do things we want, so we’re going to get many unintended consequences for humanity (as argued in “The Singularity and Machine Ethics”).
You also said that you can’t see what safety work there is to be done without having intelligent systems (e.g. “baby AGIs”) to work with. I provided a list of open problems in AI safety here, and most of them don’t require that we know how to build an AGI first. For example, one reason we can’t tell an AGI to do what humans want is that we don’t know what humans want, and there is work to be done in philosophy and in preference acquisition in AI in order to get clearer about what humans want.
Pei:
[Apr. 20, 2012]
Luke,
I think we have made our different beliefs clear, so this dialogue has achieved its goal. It won’t be an efficient usage of our time to attempt to convince each other at this moment, and each side can analyze these beliefs in proper forms of publication at a future time.
Now we can let the readers consider these arguments and conclusions.