Bill:
Currently we do not know how to build intelligent machines. When we do then we can apply those machines to learning human values. If a machine is sufficiently intelligent to pose an existential threat to humanity then it is sufficiently intelligent to learn human values.
Luke:
I generally agree that superintelligent machines capable of destroying humanity will be capable of learning human values and maximizing for them if "learn human values and maximize for them" is a coherent request at all. But capability does not employ motivation.
It seems like you were talking past each other here, and never got that fully resolved. Bill is entirely correct that a sufficiently intelligent machine would be able to learn human values. A UFAI might be motivated to do so for the purpose of manipulating humans. Making the AGI motivated to maximize human values is the hard part.
A recent StackExchange discussion suggests that self-improving general problem solvers are considered unfeasible by relevant experts, i.e. computer scientists. This suggests that the risk of ufAI should be updated downwards.
They just say that, in their current form their algorithms are too inefficient. That hardly sounds like the same thing!
I would agree that Benatar's "Better Never to Have Been" is not an accurate expression of human values. That's philospphical meanderings by one - perhaps depressed - individual. A few depressed people might want not to exist - but they don't represent human values very well.
I am more concerned that AI utility functions will serve a narrow group of humans.
I just talked about that in this comment, but it's more relevant to this thread, so I'll copy it here:
One could imagine an organization conspiring to create AGI that will optimize for the organization's collective preferences rather than humanity's collective preferences, but this won't happen because: 1. No one will throw a fit and defect from an FAI project because they won't be getting special treatment, but people will throw a fit if they perceive unfairness, so Friendly-to-humanity-AI will be a lot easier to get funding and community support for than friendly-to-exclusive-club-AI. 2. Our near mode reasoning cannot comprehend how much better a personalized AGI slave would be over FAI for us personally, so people will make that sort of decision in far mode, where idealistic values can outweigh greediness.
Finally, even if some exclusive club did somehow create an AGI that was friendly to them in particular, it wouldn't be that bad. Even if people don't care about each other very much, we do at least a little bit. Let's say that an AGI optimizing an exclusive club's CEV devotes .001% of its resources to things the rest of humanity would care about, and the rest to the things that just the club cares about. This is only worse than FAI by a factor of 10^5, which is negligible compared to the difference between FAI and UFAI.
I agree with many of Bill's comments. I too am more concerned about a "1% machine" than a technical accident that destroys civilization. A "1% machine" seems a lot more likely. This is largely a "values" issue. Some would argue that a "0%" outcome would be exceptionally bad - whereas a "1%" outcome would be "OK" - and thus is acceptable, according to the "maxipok" principle.
Just to clarify - by "1% machine" do you mean a machine which serves (the most powerful) 1% of humanity?
There's definitely a values issue as to how undesirable such an outcome would be compared to human extinction. I think there's also substantial disagreement between Bill & Luke about the relative probabilities of those outcomes though.
(As we've seen from the Hanson/Yudkowsky foom debate, drilling down to find the root cause of that kind of disagreement is really hard).
Just to clarify - by "1% machine" do you mean a machine which serves (the most powerful) 1% of humanity?
Yes, that's right.
Reinforcement learning uses a single scalar reward - by definition. Animals don't really work like that - they have many pain sensors and they are never combined together - since some of them never get further than spinal cord reflexes.
However, the source of the reward seems to me not to a piece of RL dogma to me - though sure, some models put the reward in the agent's "environment".
Part of the Muehlhauser series on AGI.
Luke Muehlhauser is Executive Director of the Singularity Institute, a non-profit research institute studying AGI safety.
Bill Hibbard is an emeritus senior scientist at University of Wisconsin-Madison and the author of Super-Intelligent Machines.
Luke Muehlhauser:
[Apr. 8, 2012]
Bill, I'm glad you agreed to discuss artificial general intelligence (AGI) with me. I hope our dialogue will be informative to many readers, and to us!
On what do we agree? In separate conversations, Ben Goertzel and Pei Wang agreed with me on the following statements (though I've clarified the wording for our conversation):
You stated in private communication that you agree with these statements, so we have substantial common ground.
I'd be curious to learn what you think about AGI safety. If you agree that AGI is an existential risk that will arrive this century, and if you value humanity, one might expect you to think it's very important that we accelerate AI safety research and decelerate AI capabilities research so that we develop safe superhuman AGI before we develop arbitrary superhuman AGI. (This is what Anna Salamon and I recommend in Intelligence Explosion: Evidence and Import.) What are your thoughts on the matter?
And, which questions would you like to raise?
Bill Hibbard:
[Apr. 11, 2012]
Luke, thanks for the invitation to this dialog and thanks for making my minor edits to your six statements. I agree with them, with the usual reservation that they are predictions and hence uncertain.
Before I answer your other questions it would be useful to set a context by reference to an issue you and Anna raised in your excellent paper, Intelligence Explosion: Evidence and Import. In Section 4.1 you wrote:
I think this is overly pessimistic about machines learning human values. We humans are good at learning what other humans value, if we are close to them for a long period of time. It is reasonable to think that machines much more intelligent than humans will be able to learn what humans value.
Solomonoff induction and AIXI are about algorithms learning accurate world models. Human values are part of the world so the AIXI world model includes a model of the values of humans. Yudkowsky's CEV assumed that a future machine (called a Really Powerful Optimization Process) could learn the values of humans, and could even predict the values of humans assuming evolution to some convergent limit.
I am less worried about an AI exterminating all humans because it cannot learn our values, and more worried about AI that accurately knows the values of its wealthy and powerful human masters and dominates the rest of humanity for its masters' benefit. I think the political problems of AI safety are more difficult than the technical problems.
My JAGI paper still under review, Model-based Utility Functions, is an effort to help with the technical problems of AI safety. So I do take those problems seriously. But I am more worried about the political problems than the technical problems.
In your Conclusion, you and Anna wrote:
To a significant extent we know how to achieve economic stability, by limiting debt/equity ratios for individuals and for financial institutions, and enforcing accounting standards that prohibit hiding debt and exaggerating equity. But there are serious political problems in imposing those limits and standards.
Now to your question about the rates of research on AI safety versus AI capabilities. My only previous expression about this issue has been a desire to accelerate the public's understanding of the social issues around AI, and their exercise of democratic control over the technology to ensure that it benefits everyone. For example, AI will enable machines to produce all the goods and services everyone needs, so we should use AI to eliminate poverty rather than causing poverty by depriving people of any way to earn a living. This is a political issue.
Even if you disagree with me about the relative importance of technical versus political problems with AI, I think you will still need politics to achieve your goals. To accelerate AI safety research and decelerate AI capabilities research on a global scale will require intense political effort. And once we think we know how to build safe AI, it will be a political problem to enforce this knowledge on AI designs.
I do agree about the need to accelerate AI safety research and think that the AGI conferences are very useful for that goal. However, the question of decelerating AI capability research is more difficult. AI will be useful in the struggle against poverty, disease and aging if we get the politics right. That argues for accelerating AI capabilities to alleviate human suffering. On the other hand, AI before the technical and political problems are solved will be a disaster. I think the resolution of this dilemma is to recognize that it will be impossible to decelerate AI capabilities, so any effort expended on that goal could be better applied to the political and technical problems of AI safety.
The struggle for wealth and power in human society, and the struggle to improve people's lives, are very strong forces accelerating AI capabilities. Organizations want to cut costs and improve performance by replacing human workers by machines. Businesses want to improve their services by sensing and analyzing huge amounts of consumer and other social data. Investors want to gain market advantage through all sorts of information processing. Politicians want to sense and analyze social data in order to gain political power. Militaries want to gain advantages by sensing and processing battlefield (which increasingly includes civil society) information. Medical researchers want to understand and cure diseases by sensing and analyzing enormously complex biological processes. There is a long list of strong social motives for accelerating AI capabilities, so I think it will be impossible to decelerate AI capabilities.
The question I'd raise is: How can we educate the public about the opportunities and dangers of AI, and how can we create a political movement in favor of AI benefitting all humans? The Singularity Institute's annual summits are useful for educating the public.
Luke:
[Apr 13th, 2012]
I'm not sure how much we disagree about machines learning human values. That is, as you note, how CEV might work: a seed AI learns our values, extrapolates them, and uses these extrapolated values to build a utility function for a successor superintelligence. I just don't think naive neural net approaches as in Guarini (2006) will work, for reasons explained in The Singularity and Machine Ethics.
So how do we program an AI to "Figure out what the humans want and do that"? That, I think, is a Very Hard Technical Problem that breaks down into lots of difficult sub-problems. I'm glad you're one of the relatively few people working in this problem space, e.g. with Model-Based Utility Functions.
I share your concerns about the political problems related to powerful AGI. That, too, is a Very Hard Problem. I'm not sure whether the technical or political problems are more difficult.
I agree that economic and political incentives will make it very difficult to decelerate AGI progress. But it's not impossible. Here are just two ideas:
Persuade key AGI researchers of the importance of safety. The most promising AGI work is done by a select few individuals. It may be possible to get them to grok that AGI capabilities research merely brings forward the date by which we must solve these Very Hard Problems, and it's already unclear whether we'll be able to solve them in time. Researchers have changed their minds before about the moral value of their work. Joseph Rotblat resigned from the Manhattan Project due to conscience. Leo Szilard switched from physics to molecular biology when he realized the horror of atomic weapons. Michael Michaud resigned from SETI in part because of concerns about the danger of contacting advanced civilizations. And of course, Eliezer Yudkowsky once argued that we should bring about the Singularity as quickly as possible, and then made a complete U-Turn. If we can change the minds of a few key AGI scientists, it may be that key insights into AGI are delayed by years or decades. Somebody would have come up with General Relativity if Einstein hadn't, but we may have had to wait a few decades.
Get machine superintelligence treated as a weapon of mass destruction. This could make it more difficult to create machine superintelligence. On the other hand, if we got governments to take machine superintelligence this seriously, our actions might instead trigger an arms race and accelerate progress toward machine superintelligence.
But yes: it will be easier to accelerate AGI safety research, which means doing things like (1) raising awareness, (2) raising funds for those doing AGI safety research, and (3) drawing more brainpower to the relevant research problems.
Finally, you asked:
I hope that deliverables like the Singularity Summit and Facing the Singularity go some distance toward educating the public. But I'm not sure how much this helps. If we educate the masses, are they capable of making wise decisions with that kind of complex information? History, I think, suggests the answer may be "No." Unfortunately, intelligent and well-educated experts may think just as poorly as laymen about these issues.
Do you have your own thoughts on how to reduce the severity of the political problem?
Also: I think my biggest problem is there not being enough good researchers working on the sub-problems of AGI safety. Do you have suggestions for how to attract more brainpower to the field?
Bill:
[16 April 2012]
Luke, here are responses to a few questions from your message, starting with:
I'm glad we don't have much to disagree about here and I certainly do not advocate a naive neural net approach. Currently we do not know how to build intelligent machines. When we do then we can apply those machines to learning human values. If a machine is sufficiently intelligent to pose an existential threat to humanity then it is sufficiently intelligent to learn human values.
Yes, the lack of wisdom by the public and by experts is frustrating. It is good that the Singularity Institute has done so much to describe cognitive biases that affect human decision making.
On the other hand, there have been some successes in the environmental, consumer safety, civil rights and arms control movements. I'd like to see a similar political movement in favor of AI benefitting all humans. Perhaps this movement can build on current social issues linked to information technology, like privacy and the economic plight of people whose skills become obsolete in our increasingly automated and networked economy.
One thing I'd like to see is more AI experts speaking publicly about the political problem. When they get an opportunity to speak to a wide audience on TV or other mass venue, they often talk about the benefits without addressing the risks. I discussed this issue in an H+ Magazine article. My article specifically noted the 2009 Interim Report of the AAAI Presidential Panel on Long-Term AI Futures, Ray Kurzweil's book The Singularity is Near, and Jaron Lanier's 2010 New York Times OP-ED. In my opinion each of these seriously underestimated the risks of AI. Of course these experts are entitled to their own opinions. However, I believe they discount the risk because they only consider the risk of AI escaping all human control, what we are calling the technical risk, and which they do not take seriously. I think these experts do not consider the risk that AI will enable a small group of humans to dominate the rest of humanity, what I call the political risk. I have tried to raise the political risk in letters to the editor of the New York Times and in my other publications, but I have not been very effective in reaching the broader culture.
I'd also like to see the Singularity Institute address the political as well as the technical risks. Making you the Executive Director is hopefully a step in the right direction.
That is an excellent question. The mathematical theory of AGI seems to have taken off since Hutter's AIXI in 2005. Now the AGI conferences and JAGI are providing respectable publication venues specifically aimed at the mathematical theory of AGI. So hopefully smart young researchers will be inspired and see opportunities to work in this theory, including AGI safety.
And if more AI experts will speak about the risks of AI in the broader culture, that would probably also inspire young researchers, and research funding agencies, to focus on AGI safety.
Luke:
[Apr. 16, 2012]
It looks to me like we do disagree significantly about the difficulty of getting a machine to learn human values, so let me ask you more about that. You write:
I generally agree that superintelligent machines capable of destroying humanity will be capable of learning human values and maximizing for them if "learn human values and maximize for them" is a coherent request at all. But capability does not imply motivation. The problem is that human extinction is a convergent outcome of billions of possible goal systems in superintelligent AI, whereas getting the first superintelligent AI to learn human values and maximize for them is like figuring out how to make the very first atomic bomb explode in the shape of an elephant.
To illustrate, consider the Golem Genie scenario:
So let's say you want to specify "Learn human values and maximize that" as a utility function. How do you do that? That seems incredibly hard to me, for reasons I can explore in detail if you wish.
And that's not where most of the difficulty is, either. Most of the difficulty is in building an AI capable of executing that kind of utility function in the first place, as opposed to some kluge or reinforcement learner or something that isn't even capable of taking goals like that and therefore isn't even capable of doing what we want when given superintelligence.
Bill:
[Apr. 20, 2012]
Your Golem Genie poses an interesting dilemma. But "hedonistic utilitarianism" is not an accurate summary of human values. Reasonable humans would not value a history that converges on the universe tiled with tiny minds enjoying constant maximum pleasure and neither would an AI implementing an accurate model of human values.
But any summary of values that a human could write down would certainly be inaccurate. Consider an analogy with automatic language processing. Smart linguists have spent generations formulating language rules, yet attempts to automate language processing based on those rules have worked poorly. Recent efforts at language processing by automated statistical learning of rules from large amounts of actual human language use have been more successful (although still inaccurate because they are not yet truly intelligent). It seems that humans cannot write down rules that accurately model their behavior, including rules that model their values.
I do agree with you that there are tricky problems in getting from human values to a utility function, although you probably think it is a bigger, more dangerous problem than I do. The issues include:
While these are all tricky problems, every reasonable person assigns maximal negative utility value to any history that includes human extinction. It seems reasonable that a solution for deriving an AI utility function from human values should not cause human extinction.
I think the most difficult issue is the strong human tendency toward group identities and treating people differently based on group membership. There are groups of people who wish the extinction or subjugation of other groups of people. Filtering such wishes out of human values, in forming AI utility functions, may require some sort of "semantic surgery" on values (just to be clear, this "semantic surgery" consists of changes to the values being processed into an AI utility function, not any changes to the people holding those values).
I am familiar with the concern that an AI will have an instrumental sub-goal to gather resources in order to increase its ability to maximize its utility function, and that it may exterminate humans in order to reuse the atoms from their bodies as resources for itself. However, any reasonable human would assign maximum negative value to any history that includes exterminating humans for their atoms and so would a reasonable utility function model of human values. An AI motivated by such a utility function would gather atoms only to the extent that such behavior increases utility. Gathering atoms to the extent that it exterminates humans would have maximum negative utility and the agent wouldn't do it.
In your recent message you wrote:
I am curious about your attitude toward reinforcement learning. Here you disparage it, but in Section 2.4 of Intelligence Explosion: Evidence and Import you discuss AIXI as an important milestone towards AI and AIXI is a reinforcement learner.
One problem with reinforcement learning is that it is commonly restricted to utility functions whose values are rewards sent to the agent from the environment. I don't think this is a reasonable model of the way that real-world agents compute their utility functions. (Informally I've used reinforcement learning to refer to any agent motivated to maximize sums of discounted future utility function values, but in my mathematical papers about AGI theory I try to conform to standard usage.) The AGI-11 papers of Dewey, and Ring and Orseau also demonstrate problems with reinforcement learning. These problems can be avoided using utility function definitions that are not restricted to rewards from the environment to the agent.
Luke:
[Apr. 20, 2012]
We seem to agree that "any summary of values that a human could write down would certainly be inaccurate" (including, e.g., hedonistic utilitarianism). You have nicely summarized some of the difficulties we face in trying to write a utility function that captures human values, but if you explained why this problem is not going to be as big a problem as I am saying it is, then I missed it. You also seem to agree that convergent instrumental goals for (e.g.) resource acquisition pose an existential threat to humanity (as argued by Omohundro 2008 and Bostrom 2012), and again if you have a solution in mind, I missed it. So I'm not sure why you're less worried about the technical problems than I am.
As for reinforcement learners: I don't "disparage" them, I just say that they aren't capable of taking human-friendly goals. Reinforcement learning is indeed an important advance in AI, but we can't build a Friendly AI from a reinforcement learner (or from other existing AI architectures), which is why AI by default will be disastrous for humans. As Dewey (2011) explains:
(Above, I am using "reinforcement learner" in the traditional, narrower sense.)
Back to the "technical problem" of defining a human-friendly utility function and building an AI architecture capable of maximizing it. Do you have a sense for why you're less worried about the difficulty of that problem than I am?
Bill:
[April 23, 2012]
I think our assessments of the relative danger from the technical versus the political problem come from our different intuitions. I do not have a proof of a solution to the technical problem. If I did, I would publish it in a paper.
In your message of 16 April 2012 you wrote:
Yes Please. I'd be interested if you'd like to explain your reasons.
Luke:
[May 1, 2012]
I tried to summarize some aspects of the technical difficulty of Friendly AI in "The Singularity and Machine Ethics."
But let's zoom in on the proposal to specify P as a utility function in an AI, where P = "Learn human values and maximize their satisfaction." There are two types of difficulty we might face when trying to implement this project:
I think we are in situation #2, which is why the technical difficulty of Friendly AI is so hard. At the moment, it looks like we'd need to solve huge swaths of philosophy before being able to build Friendly AI, and that seems difficult because most philosophical questions aren't considered to be "solved" after 2.5 millennia of work.
For example, consider the following questions:
Since I don't know what I "mean" by these things, I sure as hell can't tell an AI what I mean. I couldn't even describe it in conversation, let alone encode it precisely in a utility function. Occasionally, creative solutions to the problem are proposed, like Paul Christiano's, but they only nascent proposals and suffer major difficulties of their own.
Bill:
[May 9, 2012]
Yes, there are technical problems of expressing a person's values as a utility function, normalizing utility functions between different people, and combining the utility functions of different people to derive an overall AI utility function.
However, every reasonable person assigns maximally negative utility to human extinction, so over a broad range of solutions to these technical problems, the overall AI utility function will assign maximally negative value to any future history that includes human extinction.
Do you agree with this?
If you do agree, how does this lead to a high probability of human extinction?
Luke:
[June 10, 2012]
Bill, you acknowledge the technical difficulty of expressing a person's values as a utility function (see the literature on "preference acquisition"), and of combining many utility functions to derive a desirable AI utility function (see the literatures on "population ethics" and "social choice"). But if I understand you correctly, you seem to think that:
I think proposition P is false, for several reasons:
First: Most people think there are fates worth than death, so a future including human extinction wouldn't be assigned maximally negative utility.
Second: Even ignoring this, many people think human extinction would be better than the status quo (at least, if you human extinction could occur without much suffering). See Benatar's Better Never to Have Been. Many negative utilitarians agree with Benatar's conclusion, if not his reasoning. The surest way to minimize conscious suffering is to eliminate beings capable of conscious suffering.
Third: There are many problems similar to problem #1 I gave above ("What counts as a 'human'?"). Suppose we manage to design an AI architecture that has preferences over states of affairs (or future histories) as opposed to preferences over a reward function, and we define a utility function for the AI that might be sub-optimal in many ways but "clearly" assigns utility 0 (or negative 1m, or whatever) to human extinction. We're still left with the problems of specification explained in The Singularity and Machine Ethics. A machine superintelligence would possess two features that make it dangerous to give it any utility function at all:
We can't predict exactly what an advanced AI would do, but sci-fi stories like With Folded Hands provide concrete illustrations of the sort of thing that could go wrong as we try to tell advanced AIs "Don't let humans die" in a way that captures the hidden complexity of what we mean by that.
Bill:
[June 24, 2012]
A first point: your description of my proposition P included "almost any AI utility function resulting from an attempt to combine the preferences of humans" but what I wrote was "over a broad range of solutions to these technical problems" which is not quite the same. I think the inconsistency of individual human values and the ambiguity of combining values from multiple humans imply that there is no perfect solution. Rather there is a range of adequate solutions that fall within what we might call the "convex hull" of the inconsistency and ambiguity. I should remove the word "broad" from my proposition, because I didn't mean "almost any."
Now I address your three reasons for thinking P is false:
Furthermore, the prime directive in the story, "to serve and obey and guard men from harm," is not included in what I called "a broad range of solutions to these technical problems." It is not an attempt to resolve the inconsistency of individual human values or the ambiguity of combining values from different people. The nine-word prime directive was designed to make the story work, rather than as a serious effort to create safe AI. Humans value liberty and the mechanicals in the story are not behaving in accord with those values. In fact, the prime directive suffers from ambiguities similar to those that affect Asimov's Laws of Robotics: forced lobotomy is not serving or obeying humans.
More generally, I agree that the superpower and literalness of AI imply that poorly specified AI utility functions will lead to serious risks. The question is how narrow is the target for an AI utility function to avoid human extinction.
My proposition was about human extinction but I am more concerned about other threats. Recall that I asked to reword "It is a potential existential risk" as "It poses existential and other serious risks" in our initial agreement at the start of our dialog. I am more concerned that AI utility functions will serve a narrow group of humans.
Luke:
[June 27, 2012]
Replies below...
Oh, okay. Sorry to have misinterpreted you.
Okay.
I'm confused, so maybe we mean different things by "human values." Clearly, many humans today believe and claim that, for example, they are negative utilitarians and would prefer to minimize suffering, even if this implies eliminating all beings capable of suffering (in a painless way). Are the things you're calling "human values" not ever dependent on abstract reasoning and philosophical reflection? Are you using the term "human values" to refer only to some set of basic drives written into humans by evolution?
Right, I just meant to use With Folded Hands as an example of unintended consequences aka Why It Is Very Hard to Precisely Specify What We Want.
Right. But of course there are lots of futures we hate besides extinction, too, such as the ones where we are all lobotomized. Or the ones where we are kept in zoos and AI-run science labs. Or the ones where the AIs maximize happiness by rejiggering our dopamine reward systems and then leave us to lie still on the ground attached to IVs. Or the ones where we get everything we want except for the one single value we call "novelty," and thus the AI tiles the solar system with tiny digital minds replaying nothing but the single most amazing experience again and again until the Sun explodes.
So, I'm still unsure how much we disagree. I'm very worried about AI utility functions designed to serve only a narrow group of humans, but I'm just as worried about people who try to build an AGI with an altruistic utility function who simply don't take the problems of unintended consequences seriously enough.
This has been a helpful and clarifying conversation. Do you think there's more for us to discuss at this time?
Bill:
[7 July 2012]
Yes, this has been a useful dialog and we have both expressed our views reasonably clearly. So I am happy to end it for now.