Part of the Muehlhauser series on AGI.

Luke Muehlhauser is Executive Director of the Singularity Institute, a non-profit research institute studying AGI safety.

Bill Hibbard is an emeritus senior scientist at University of Wisconsin-Madison and the author of Super-Intelligent Machines.


Luke Muehlhauser:

[Apr. 8, 2012]

Bill, I'm glad you agreed to discuss artificial general intelligence (AGI) with me. I hope our dialogue will be informative to many readers, and to us!

On what do we agree? In separate conversations, Ben Goertzel and Pei Wang agreed with me on the following statements (though I've clarified the wording for our conversation):

  1. Involuntary death is bad, and can be avoided with the right technology.
  2. Humans can be enhanced by merging with technology.
  3. Humans are on a risky course in general, because powerful technologies can destroy us, humans often behave not in their own best interests, and we are unlikely to voluntarily halt technological progress.
  4. AGI is likely this century.
  5. AGI will greatly transform the world. It poses existential and other serious risks, but could also be the best thing that ever happens to us if we do it right.
  6. Careful effort will be required to ensure that AGI results in good things rather than bad things for humanity.

You stated in private communication that you agree with these statements, so we have substantial common ground.

I'd be curious to learn what you think about AGI safety. If you agree that AGI is an existential risk that will arrive this century, and if you value humanity, one might expect you to think it's very important that we accelerate AI safety research and decelerate AI capabilities research so that we develop safe superhuman AGI before we develop arbitrary superhuman AGI. (This is what Anna Salamon and I recommend in Intelligence Explosion: Evidence and Import.) What are your thoughts on the matter?

And, which questions would you like to raise?

Bill Hibbard:

[Apr. 11, 2012]

Luke, thanks for the invitation to this dialog and thanks for making my minor edits to your six statements. I agree with them, with the usual reservation that they are predictions and hence uncertain.

Before I answer your other questions it would be useful to set a context by reference to an issue you and Anna raised in your excellent paper, Intelligence Explosion: Evidence and Import. In Section 4.1 you wrote:

Unfortunately, specifying what humans value may be extraordinarily difficult, given the complexity and fragility of human preferences (Yudkowsky 2011; Muehlhauser and Helm, this volume), and allowing an AI to learn desirable goals from reward and punishment may be no easier (Yudkowsky 2008a). If this is correct, then the creation of self-improving AI may be detrimental by default unless we first solve the problem of how to build an AI with a stable, desirable utility function - a "Friendly AI" (Yudkowsky 2001).

I think this is overly pessimistic about machines learning human values. We humans are good at learning what other humans value, if we are close to them for a long period of time. It is reasonable to think that machines much more intelligent than humans will be able to learn what humans value.

Solomonoff induction and AIXI are about algorithms learning accurate world models. Human values are part of the world so the AIXI world model includes a model of the values of humans.  Yudkowsky's CEV assumed that a future machine (called a Really Powerful Optimization Process) could learn the values of humans, and could even predict the values of humans assuming evolution to some convergent limit.

I am less worried about an AI exterminating all humans because it cannot learn our values, and more worried about AI that accurately knows the values of its wealthy and powerful human masters and dominates the rest of humanity for its masters' benefit. I think the political problems of AI safety are more difficult than the technical problems.

My JAGI paper still under review, Model-based Utility Functions, is an effort to help with the technical problems of AI safety. So I do take those problems seriously. But I am more worried about the political problems than the technical problems.

In your Conclusion, you and Anna wrote:

... and achieving economic stability is ultimately a problem of being smart enough to figure out how to achieve it.

To a significant extent we know how to achieve economic stability, by limiting debt/equity ratios for individuals and for financial institutions, and enforcing accounting standards that prohibit hiding debt and exaggerating equity. But there are serious political problems in imposing those limits and standards.

Now to your question about the rates of research on AI safety versus AI capabilities. My only previous expression about this issue has been a desire to accelerate the public's understanding of the social issues around AI, and their exercise of democratic control over the technology to ensure that it benefits everyone. For example, AI will enable machines to produce all the goods and services everyone needs, so we should use AI to eliminate poverty rather than causing poverty by depriving people of any way to earn a living. This is a political issue.

Even if you disagree with me about the relative importance of technical versus political problems with AI, I think you will still need politics to achieve your goals. To accelerate AI safety research and decelerate AI capabilities research on a global scale will require intense political effort. And once we think we know how to build safe AI, it will be a political problem to enforce this knowledge on AI designs.

I do agree about the need to accelerate AI safety research and think that the AGI conferences are very useful for that goal. However, the question of decelerating AI capability research is more difficult. AI will be useful in the struggle against poverty, disease and aging if we get the politics right. That argues for accelerating AI capabilities to alleviate human suffering. On the other hand, AI before the technical and political problems are solved will be a disaster. I think the resolution of this dilemma is to recognize that it will be impossible to decelerate AI capabilities, so any effort expended on that goal could be better applied to the political and technical problems of AI safety.

The struggle for wealth and power in human society, and the struggle to improve people's lives, are very strong forces accelerating AI capabilities. Organizations want to cut costs and improve performance by replacing human workers by machines. Businesses want to improve their services by sensing and analyzing huge amounts of consumer and other social data. Investors want to gain market advantage through all sorts of information processing. Politicians want to sense and analyze social data in order to gain political power. Militaries want to gain advantages by sensing and processing battlefield (which increasingly includes civil society) information. Medical researchers want to understand and cure diseases by sensing and analyzing enormously complex biological processes. There is a long list of strong social motives for accelerating AI capabilities, so I think it will be impossible to decelerate AI capabilities.

The question I'd raise is: How can we educate the public about the opportunities and dangers of AI, and how can we create a political movement in favor of AI benefitting all humans? The Singularity Institute's annual summits are useful for educating the public.


Luke:

[Apr 13th, 2012]

I'm not sure how much we disagree about machines learning human values. That is, as you note, how CEV might work: a seed AI learns our values, extrapolates them, and uses these extrapolated values to build a utility function for a successor superintelligence. I just don't think naive neural net approaches as in Guarini (2006) will work, for reasons explained in The Singularity and Machine Ethics.

So how do we program an AI to "Figure out what the humans want and do that"? That, I think, is a Very Hard Technical Problem that breaks down into lots of difficult sub-problems. I'm glad you're one of the relatively few people working in this problem space, e.g. with Model-Based Utility Functions.

I share your concerns about the political problems related to powerful AGI. That, too, is a Very Hard Problem. I'm not sure whether the technical or political problems are more difficult.

I agree that economic and political incentives will make it very difficult to decelerate AGI progress. But it's not impossible. Here are just two ideas:

Persuade key AGI researchers of the importance of safety. The most promising AGI work is done by a select few individuals. It may be possible to get them to grok that AGI capabilities research merely brings forward the date by which we must solve these Very Hard Problems, and it's already unclear whether we'll be able to solve them in time. Researchers have changed their minds before about the moral value of their work. Joseph Rotblat resigned from the Manhattan Project due to conscience. Leo Szilard switched from physics to molecular biology when he realized the horror of atomic weapons. Michael Michaud resigned from SETI in part because of concerns about the danger of contacting advanced civilizations. And of course, Eliezer Yudkowsky once argued that we should bring about the Singularity as quickly as possible, and then made a complete U-Turn. If we can change the minds of a few key AGI scientists, it may be that key insights into AGI are delayed by years or decades. Somebody would have come up with General Relativity if Einstein hadn't, but we may have had to wait a few decades.

Get machine superintelligence treated as a weapon of mass destruction. This could make it more difficult to create machine superintelligence. On the other hand, if we got governments to take machine superintelligence this seriously, our actions might instead trigger an arms race and accelerate progress toward machine superintelligence.

But yes: it will be easier to accelerate AGI safety research, which means doing things like (1) raising awareness, (2) raising funds for those doing AGI safety research, and (3) drawing more brainpower to the relevant research problems.

Finally, you asked:

How can we educate the public about the opportunities and dangers of AI, and how can we create a political movement in favor of AI benefitting all humans?

I hope that deliverables like the Singularity Summit and Facing the Singularity go some distance toward educating the public. But I'm not sure how much this helps. If we educate the masses, are they capable of making wise decisions with that kind of complex information? History, I think, suggests the answer may be "No." Unfortunately, intelligent and well-educated experts may think just as poorly as laymen about these issues.

Do you have your own thoughts on how to reduce the severity of the political problem?

Also: I think my biggest problem is there not being enough good researchers working on the sub-problems of AGI safety. Do you have suggestions for how to attract more brainpower to the field?


Bill:

[16 April 2012]

Luke, here are responses to a few questions from your message, starting with:

I'm not sure how much we disagree about machines learning human values. That is, as you note, how CEV might work: a seed AI learns our values, extrapolates them, and uses these extrapolated values to build a utility function for a successor superintelligence. I just don't think naive neural net approaches as in Guarini (2006) will work, for reasons explained in The Singularity and Machine Ethics.

I'm glad we don't have much to disagree about here and I certainly do not advocate a naive neural net approach. Currently we do not know how to build intelligent machines. When we do then we can apply those machines to learning human values. If a machine is sufficiently intelligent to pose an existential threat to humanity then it is sufficiently intelligent to learn human values.

If we educate the masses, are they capable of making wise decisions with that kind of complex information? History, I think, suggests the answer may be "No." Unfortunately, intelligent and well-educated experts may think just as poorly as laymen about these issues.  

Yes, the lack of wisdom by the public and by experts is frustrating. It is good that the Singularity Institute has done so much to describe cognitive biases that affect human decision making.

On the other hand, there have been some successes in the environmental, consumer safety, civil rights and arms control movements. I'd like to see a similar political movement in favor of AI benefitting all humans. Perhaps this movement can build on current social issues linked to information technology, like privacy and the economic plight of people whose skills become obsolete in our increasingly automated and networked economy.

Do you have your own thoughts on how to reduce the severity of the political problem?

One thing I'd like to see is more AI experts speaking publicly about the political problem. When they get an opportunity to speak to a wide audience on TV or other mass venue, they often talk about the benefits without addressing the risks. I discussed this issue in an H+ Magazine article. My article specifically noted the 2009 Interim Report of the AAAI Presidential Panel on Long-Term AI Futures, Ray Kurzweil's book The Singularity is Near, and Jaron Lanier's 2010 New York Times OP-ED. In my opinion each of these seriously underestimated the risks of AI. Of course these experts are entitled to their own opinions. However, I believe they discount the risk because they only consider the risk of AI escaping all human control, what we are calling the technical risk, and which they do not take seriously. I think these experts do not consider the risk that AI will enable a small group of humans to dominate the rest of humanity, what I call the political risk. I have tried to raise the political risk in letters to the editor of the New York Times and in my other publications, but I have not been very effective in reaching the broader culture.

I'd also like to see the Singularity Institute address the political as well as the technical risks. Making you the Executive Director is hopefully a step in the right direction.

Also: I think my biggest problem is there not being enough good researchers working on the sub-problems of AGI safety. Do you have suggestions for how to attract more brainpower to the field?

That is an excellent question. The mathematical theory of AGI seems to have taken off since Hutter's AIXI in 2005. Now the AGI conferences and JAGI are providing respectable publication venues specifically aimed at the mathematical theory of AGI. So hopefully smart young researchers will be inspired and see opportunities to work in this theory, including AGI safety.

And if more AI experts will speak about the risks of AI in the broader culture, that would probably also inspire young researchers, and research funding agencies, to focus on AGI safety.


Luke:

[Apr. 16, 2012]

It looks to me like we do disagree significantly about the difficulty of getting a machine to learn human values, so let me ask you more about that. You write:

Currently we do not know how to build intelligent machines. When we do then we can apply those machines to learning human values. If a machine is sufficiently intelligent to pose an existential threat to humanity then it is sufficiently intelligent to learn human values.

I generally agree that superintelligent machines capable of destroying humanity will be capable of learning human values and maximizing for them if "learn human values and maximize for them" is a coherent request at all. But capability does not imply motivation. The problem is that human extinction is a convergent outcome of billions of possible goal systems in superintelligent AI, whereas getting the first superintelligent AI to learn human values and maximize for them is like figuring out how to make the very first atomic bomb explode in the shape of an elephant.

To illustrate, consider the Golem Genie scenario:

Suppose an unstoppably powerful genie appears to you and announces that it will return in fifty years. Upon its return, you will be required to supply it with a [utility function] which it will then enforce with great precision throughout the universe. For example, if you supply the genie with hedonistic utilitarianism, it will maximize pleasure by harvesting all available resources and using them to tile the universe with identical copies of the smallest possible mind each copy of which will experience an endless loop of the most pleasurable experience possible.

Let us call this precise, instruction-following genie a Golem Genie. (A golem is a creature from Jewish folklore that would in some stories do exactly as told... often with unintended consequences, for example polishing a dish until it is as thin as paper...)

If by the appointed time you fail to supply your Golem Genie with a [utility function], then it will permanently model its goal system after the first logically coherent [utility function] that anyone articulates to it, and that’s not a risk you want to take. Moreover, once you have supplied the Golem Genie with its [utility function], there will be no turning back. Until the end of time, the genie will enforce that one [utility function] without exception, not even to satisfy its own (previous) desires.

So let's say you want to specify "Learn human values and maximize that" as a utility function. How do you do that? That seems incredibly hard to me, for reasons I can explore in detail if you wish.

And that's not where most of the difficulty is, either. Most of the difficulty is in building an AI capable of executing that kind of utility function in the first place, as opposed to some kluge or reinforcement learner or something that isn't even capable of taking goals like that and therefore isn't even capable of doing what we want when given superintelligence.


Bill:

[Apr. 20, 2012]

Your Golem Genie poses an interesting dilemma. But "hedonistic utilitarianism" is not an accurate summary of human values. Reasonable humans would not value a history that converges on the universe tiled with tiny minds enjoying constant maximum pleasure and neither would an AI implementing an accurate model of human values.

But any summary of values that a human could write down would certainly be inaccurate. Consider an analogy with automatic language processing. Smart linguists have spent generations formulating language rules, yet attempts to automate language processing based on those rules have worked poorly. Recent efforts at language processing by automated statistical learning of rules from large amounts of actual human language use have been more successful (although still inaccurate because they are not yet truly intelligent). It seems that humans cannot write down rules that accurately model their behavior, including rules that model their values.

I do agree with you that there are tricky problems in getting from human values to a utility function, although you probably think it is a bigger, more dangerous problem than I do. The issues include:

  1. A person's value preferences often violate the laws of probability, making it impossible to express their preferences as a utility function.
  2. A person's preferences vary over time depending on the person's experiences (e.g., propaganda works).
  3. Different people have different preferences, which need to be combined into a single utility function for an AI. Of course this only needs to be done if the AI serves more than one person. If the AI serves a single person or a small group that is what I call the political problem.

While these are all tricky problems, every reasonable person assigns maximal negative utility value to any history that includes human extinction. It seems reasonable that a solution for deriving an AI utility function from human values should not cause human extinction.

I think the most difficult issue is the strong human tendency toward group identities and treating people differently based on group membership. There are groups of people who wish the extinction or subjugation of other groups of people. Filtering such wishes out of human values, in forming AI utility functions, may require some sort of "semantic surgery" on values (just to be clear, this "semantic surgery" consists of changes to the values being processed into an AI utility function, not any changes to the people holding those values).

I am familiar with the concern that an AI will have an instrumental sub-goal to gather resources in order to increase its ability to maximize its utility function, and that it may exterminate humans in order to reuse the atoms from their bodies as resources for itself. However, any reasonable human would assign maximum negative value to any history that includes exterminating humans for their atoms and so would a reasonable utility function model of human values. An AI motivated by such a utility function would gather atoms only to the extent that such behavior increases utility. Gathering atoms to the extent that it exterminates humans would have maximum negative utility and the agent wouldn't do it.

In your recent message you wrote:

... as opposed to some kluge or reinforcement learner or something that isn't even capable of taking goals like that and therefore isn't even capable of doing what we want when given superintelligence.

I am curious about your attitude toward reinforcement learning. Here you disparage it, but in Section 2.4 of Intelligence Explosion: Evidence and Import you discuss AIXI as an important milestone towards AI and AIXI is a reinforcement learner.

One problem with reinforcement learning is that it is commonly restricted to utility functions whose values are rewards sent to the agent from the environment. I don't think this is a reasonable model of the way that real-world agents compute their utility functions. (Informally I've used reinforcement learning to refer to any agent motivated to maximize sums of discounted future utility function values, but in my mathematical papers about AGI theory I try to conform to standard usage.) The AGI-11 papers of Dewey, and Ring and Orseau also demonstrate problems with reinforcement learning. These problems can be avoided using utility function definitions that are not restricted to rewards from the environment to the agent.


Luke:

[Apr. 20, 2012]

We seem to agree that "any summary of values that a human could write down would certainly be inaccurate" (including, e.g., hedonistic utilitarianism). You have nicely summarized some of the difficulties we face in trying to write a utility function that captures human values, but if you explained why this problem is not going to be as big a problem as I am saying it is, then I missed it. You also seem to agree that convergent instrumental goals for (e.g.) resource acquisition pose an existential threat to humanity (as argued by Omohundro 2008 and Bostrom 2012), and again if you have a solution in mind, I missed it. So I'm not sure why you're less worried about the technical problems than I am.

As for reinforcement learners: I don't "disparage" them, I just say that they aren't capable of taking human-friendly goals. Reinforcement learning is indeed an important advance in AI, but we can't build a Friendly AI from a reinforcement learner (or from other existing AI architectures), which is why AI by default will be disastrous for humans. As Dewey (2011) explains:

Reinforcement learning can only be used in the real world to define agents whose goal is to maximize expected rewards, and since this goal does not match with human goals, AGIs based on reinforcement learning will often work at cross-purposes to us. To solve this problem, we define value learners, agents that can be designed to learn and maximize any initially unknown utility function so long as we provide them with an idea of what constitutes evidence about that utility function.

(Above, I am using "reinforcement learner" in the traditional, narrower sense.)

Back to the "technical problem" of defining a human-friendly utility function and building an AI architecture capable of maximizing it. Do you have a sense for why you're less worried about the difficulty of that problem than I am?


Bill:

[April 23, 2012]

I think our assessments of the relative danger from the technical versus the political problem come from our different intuitions. I do not have a proof of a solution to the technical problem. If I did, I would publish it in a paper.

In your message of 16 April 2012 you wrote:

So let's say you want to specify "Learn human values and maximize that" as a utility function. How do you do that? That seems incredibly hard to me, for reasons I can explore in detail if you wish.

Yes Please. I'd be interested if you'd like to explain your reasons.


Luke:

[May 1, 2012]

I tried to summarize some aspects of the technical difficulty of Friendly AI in "The Singularity and Machine Ethics."

But let's zoom in on the proposal to specify P as a utility function in an AI, where P = "Learn human values and maximize their satisfaction." There are two types of difficulty we might face when trying to implement this project:

  1. We know what P means but we aren't sure how to make it precise enough to encode it in a utility function.
  2. We are fundamentally, philosophically confused about what P means, and we aren't sure how to make it precise enough to encode it in a utility function.

I think we are in situation #2, which is why the technical difficulty of Friendly AI is so hard. At the moment, it looks like we'd need to solve huge swaths of philosophy before being able to build Friendly AI, and that seems difficult because most philosophical questions aren't considered to be "solved" after 2.5 millennia of work.

For example, consider the following questions:

  1. What counts as a "human"? Do those with brain injuries or neurological diseases count for these purposes? Do infants? Do anencephalic infants? If we use DNA to resurrect the strands of Homo very similar to the modern human, do those people count? Suppose we can radically augment the human mind with cognitive enhancements and brain-computer interfaces in the next few decades—do heavily augmented humans count? If we get mind uploading before AI, do uploaded humans count?
  2. What are human "values"? Are we talking about the value encodings in the dopaminergic reward system and perhaps elsewhere? (If so, we're going to have trouble translating that into anything like a coherent utility function.) Are we talking about some kind of extrapolation or rational integration of our values? (If so, what's the algorithm for this extrapolation or rational integration?)
  3. "Learn human values and maximize their satisfaction" seems to assume the aggregation of the preferences of multiple humans. How do we do that?

Since I don't know what I "mean" by these things, I sure as hell can't tell an AI what I mean. I couldn't even describe it in conversation, let alone encode it precisely in a utility function. Occasionally, creative solutions to the problem are proposed, like Paul Christiano's, but they only nascent proposals and suffer major difficulties of their own.


Bill:

[May 9, 2012]

Yes, there are technical problems of expressing a person's values as a utility function, normalizing utility functions between different people, and combining the utility functions of different people to derive an overall AI utility function.

However, every reasonable person assigns maximally negative utility to human extinction, so over a broad range of solutions to these technical problems, the overall AI utility function will assign maximally negative value to any future history that includes human extinction.

Do you agree with this?

If you do agree, how does this lead to a high probability of human extinction?


Luke:

[June 10, 2012]

Bill, you acknowledge the technical difficulty of expressing a person's values as a utility function (see the literature on "preference acquisition"), and of combining many utility functions to derive a desirable AI utility function (see the literatures on "population ethics" and "social choice"). But if I understand you correctly, you seem to think that:

P: almost any AI utility function resulting from an attempt to combine the preferences of humans will assign maximally negative utility to any future history that includes human extinction, because humans are agreed on the badness of human extinction.

I think proposition P is false, for several reasons:

First: Most people think there are fates worth than death, so a future including human extinction wouldn't be assigned maximally negative utility.

Second: Even ignoring this, many people think human extinction would be better than the status quo (at least, if you human extinction could occur without much suffering). See Benatar's Better Never to Have Been. Many negative utilitarians agree with Benatar's conclusion, if not his reasoning. The surest way to minimize conscious suffering is to eliminate beings capable of conscious suffering.

Third: There are many problems similar to problem #1 I gave above ("What counts as a 'human'?"). Suppose we manage to design an AI architecture that has preferences over states of affairs (or future histories) as opposed to preferences over a reward function, and we define a utility function for the AI that might be sub-optimal in many ways but "clearly" assigns utility 0 (or negative 1m, or whatever) to human extinction. We're still left with the problems of specification explained in The Singularity and Machine Ethics. A machine superintelligence would possess two features that make it dangerous to give it any utility function at all:

  1. Superpower: A machine superintelligence would have unprecedented powers to reshape reality, and would therefore achieve its goals with highly efficient methods that confound human expectations (e.g. it could "maximize pleasure" by tiling the universe with trillions of digital minds running a loop of a single pleasurable experience).
  2. Literalness: A machine superintelligence would recognize only precise specifications of rules and values (see Yudkowsky 2011), acting in ways that violate what feels like “common sense” to humans, and in ways that fail to respect the subtlety of human values.

We can't predict exactly what an advanced AI would do, but sci-fi stories like With Folded Hands provide concrete illustrations of the sort of thing that could go wrong as we try to tell advanced AIs "Don't let humans die" in a way that captures the hidden complexity of what we mean by that.


Bill:

[June 24, 2012]

A first point: your description of my proposition P included "almost any AI utility function resulting from an attempt to combine the preferences of humans" but what I wrote was "over a broad range of solutions to these technical problems" which is not quite the same. I think the inconsistency of individual human values and the ambiguity of combining values from multiple humans imply that there is no perfect solution. Rather there is a range of adequate solutions that fall within what we might call the "convex hull" of the inconsistency and ambiguity. I should remove the word "broad" from my proposition, because I didn't mean "almost any."

Now I address your three reasons for thinking P is false:

  1. I agree that people under duress sometimes prefer their own deaths. But at any given time most people are not under such duress and prefer life. Many who kill themselves do so because they can see no other solution to their problems. Some have diseases that cannot be cured, and would not kill themselves if a cure was available. Others are being abused or in poverty with no escape, but would not kill themselves if they could escape their abuse or poverty. A powerful AI accurately motivated by human values would provide solutions to suffering people other than suicide. Natural selection does not create beings that are generally motivated to not exist. However, I should reword "will assign maximally negative value to any future history that includes human extinction" to "will assign nearly maximal negative value to any future history that includes human extinction", to recognize that under duress humans may prefer death to some other circumstances.
  2.  Benatar's Better Never to Have Been is not an accurate expression of human values. Natural selection has created ambitious human values: to survive, reproduce and understand and control our environment.
  3. The story With Folded Hands illustrates a problem other than human extinction. My proposition was about human extinction, as illustrated by the assertion in your paper Intelligence Explosion: Evidence and Import:

Later we shall see why these convergent instrumental goals suggest that the default outcome from advanced AI is human extinction.

Furthermore, the prime directive in the story, "to serve and obey and guard men from harm," is not included in what I called "a broad range of solutions to these technical problems." It is not an attempt to resolve the inconsistency of individual human values or the ambiguity of combining values from different people. The nine-word prime directive was designed to make the story work, rather than as a serious effort to create safe AI. Humans value liberty and the mechanicals in the story are not behaving in accord with those values. In fact, the prime directive suffers from ambiguities similar to those that affect Asimov's Laws of Robotics: forced lobotomy is not serving or obeying humans.

More generally, I agree that the superpower and literalness of AI imply that poorly specified AI utility functions will lead to serious risks. The question is how narrow is the target for an AI utility function to avoid human extinction.

My proposition was about human extinction but I am more concerned about other threats. Recall that I asked to reword "It is a potential existential risk" as "It poses existential and other serious risks" in our initial agreement at the start of our dialog. I am more concerned that AI utility functions will serve a narrow group of humans.


Luke:

[June 27, 2012]

Replies below...

I didn't mean "almost any."

Oh, okay. Sorry to have misinterpreted you.

I should reword "will assign maximally negative value to any future history that includes human extinction" to "will assign nearly maximal negative value to any future history that includes human extinction",

Okay.

 Benatar's Better Never to Have Been is not an accurate expression of human values. Natural selection has created ambitious human values: to survive, reproduce and understand and control our environment.

I'm confused, so maybe we mean different things by "human values." Clearly, many humans today believe and claim that, for example, they are negative utilitarians and would prefer to minimize suffering, even if this implies eliminating all beings capable of suffering (in a painless way). Are the things you're calling "human values" not ever dependent on abstract reasoning and philosophical reflection? Are you using the term "human values" to refer only to some set of basic drives written into humans by evolution?

The story With Folded Hands illustrates a problem other than human extinction

Right, I just meant to use With Folded Hands as an example of unintended consequences aka Why It Is Very Hard to Precisely Specify What We Want.

 I agree that the superpower and literalness of AI imply that poorly specified AI utility functions will lead to serious risks. The question is how narrow is the target for an AI utility function to avoid human extinction.

Right. But of course there are lots of futures we hate besides extinction, too, such as the ones where we are all lobotomized. Or the ones where we are kept in zoos and AI-run science labs. Or the ones where the AIs maximize happiness by rejiggering our dopamine reward systems and then leave us to lie still on the ground attached to IVs. Or the ones where we get everything we want except for the one single value we call "novelty," and thus the AI tiles the solar system with tiny digital minds replaying nothing but the single most amazing experience again and again until the Sun explodes.

I am more concerned that AI utility functions will serve a narrow group of humans.

So, I'm still unsure how much we disagree. I'm very worried about AI utility functions designed to serve only a narrow group of humans, but I'm just as worried about people who try to build an AGI with an altruistic utility function who simply don't take the problems of unintended consequences seriously enough.

This has been a helpful and clarifying conversation. Do you think there's more for us to discuss at this time?


Bill:

[7 July 2012]

This has been a helpful and clarifying conversation. Do you think there's more for us to discuss at this time?

Yes, this has been a useful dialog and we have both expressed our views reasonably clearly. So I am happy to end it for now.

New Comment
16 comments, sorted by Click to highlight new comments since: Today at 10:53 AM

Bill:

Currently we do not know how to build intelligent machines. When we do then we can apply those machines to learning human values. If a machine is sufficiently intelligent to pose an existential threat to humanity then it is sufficiently intelligent to learn human values.

Luke:

I generally agree that superintelligent machines capable of destroying humanity will be capable of learning human values and maximizing for them if "learn human values and maximize for them" is a coherent request at all. But capability does not employ motivation.

It seems like you were talking past each other here, and never got that fully resolved. Bill is entirely correct that a sufficiently intelligent machine would be able to learn human values. A UFAI might be motivated to do so for the purpose of manipulating humans. Making the AGI motivated to maximize human values is the hard part.

A recent StackExchange discussion suggests that self-improving general problem solvers are considered unfeasible by relevant experts, i.e. computer scientists. This suggests that the risk of ufAI should be updated downwards.

They just say that, in their current form their algorithms are too inefficient. That hardly sounds like the same thing!

I found this to be a really constructive debate - thank you Bill and Luke!

This was excellent.

I would agree that Benatar's "Better Never to Have Been" is not an accurate expression of human values. That's philospphical meanderings by one - perhaps depressed - individual. A few depressed people might want not to exist - but they don't represent human values very well.

I am more concerned that AI utility functions will serve a narrow group of humans.

I just talked about that in this comment, but it's more relevant to this thread, so I'll copy it here:

One could imagine an organization conspiring to create AGI that will optimize for the organization's collective preferences rather than humanity's collective preferences, but this won't happen because: 1. No one will throw a fit and defect from an FAI project because they won't be getting special treatment, but people will throw a fit if they perceive unfairness, so Friendly-to-humanity-AI will be a lot easier to get funding and community support for than friendly-to-exclusive-club-AI. 2. Our near mode reasoning cannot comprehend how much better a personalized AGI slave would be over FAI for us personally, so people will make that sort of decision in far mode, where idealistic values can outweigh greediness.

Finally, even if some exclusive club did somehow create an AGI that was friendly to them in particular, it wouldn't be that bad. Even if people don't care about each other very much, we do at least a little bit. Let's say that an AGI optimizing an exclusive club's CEV devotes .001% of its resources to things the rest of humanity would care about, and the rest to the things that just the club cares about. This is only worse than FAI by a factor of 10^5, which is negligible compared to the difference between FAI and UFAI.

Substitute "company" for "AI project" and look at what happens to the first argument.

Good point.

I agree with many of Bill's comments. I too am more concerned about a "1% machine" than a technical accident that destroys civilization. A "1% machine" seems a lot more likely. This is largely a "values" issue. Some would argue that a "0%" outcome would be exceptionally bad - whereas a "1%" outcome would be "OK" - and thus is acceptable, according to the "maxipok" principle.

Just to clarify - by "1% machine" do you mean a machine which serves (the most powerful) 1% of humanity?

There's definitely a values issue as to how undesirable such an outcome would be compared to human extinction. I think there's also substantial disagreement between Bill & Luke about the relative probabilities of those outcomes though.

(As we've seen from the Hanson/Yudkowsky foom debate, drilling down to find the root cause of that kind of disagreement is really hard).

Just to clarify - by "1% machine" do you mean a machine which serves (the most powerful) 1% of humanity?

Yes, that's right.

Reinforcement learning uses a single scalar reward - by definition. Animals don't really work like that - they have many pain sensors and they are never combined together - since some of them never get further than spinal cord reflexes.

However, the source of the reward seems to me not to a piece of RL dogma to me - though sure, some models put the reward in the agent's "environment".