Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

AI risk, new executive summary

7 Stuart_Armstrong 18 April 2014 10:45AM

Thanks for all who commented on the previous AI risk executive summary! Here is the currently final version, taking the comments into account; feel free to link to it, make use of it, copy it, mock it and modify it.


AI risk

Bullet points

  • By all indications, an Artificial Intelligence could someday exceed human intelligence.
  • Such an AI would likely become extremely intelligent, and thus extremely powerful.
  • Most AI motivations and goals become dangerous when the AI becomes powerful.
  • It is very challenging to program an AI with fully safe goals, and an intelligent AI would likely not interpret ambiguous goals in a safe way.
  • A dangerous AI would be motivated to seem safe in any controlled training setting.
  • Not enough effort is currently being put into designing safe AIs.


Executive summary

The risks from artificial intelligence (AI) in no way resemble the popular image of the Terminator. That fictional mechanical monster is distinguished by many features – strength, armour, implacability, indestructability – but extreme intelligence isn’t one of them. And it is precisely extreme intelligence that would give an AI its power, and hence make it dangerous.

The human brain is not much bigger than that of a chimpanzee. And yet those extra neurons account for the difference of outcomes between the two species: between a population of a few hundred thousand and basic wooden tools, versus a population of several billion and heavy industry. The human brain has allowed us to spread across the surface of the world, land on the moon, develop nuclear weapons, and coordinate to form effective groups with millions of members. It has granted us such power over the natural world that the survival of many other species is no longer determined by their own efforts, but by preservation decisions made by humans.

In the last sixty years, human intelligence has been further augmented by automation: by computers and programmes of steadily increasing ability. These have taken over tasks formerly performed by the human brain, from multiplication through weather modelling to driving cars. The powers and abilities of our species have increased steadily as computers have extended our intelligence in this way. There are great uncertainties over the timeline, but future AIs could reach human intelligence and beyond. If so, should we expect their power to follow the same trend? When the AI’s intelligence is as beyond us as we are beyond chimpanzees, would it dominate us as thoroughly as we dominate the great apes?

There are more direct reasons to suspect that a true AI would be both smart and powerful. When computers gain the ability to perform tasks at the human level, they tend to very quickly become much better than us. No-one today would think it sensible to pit the best human mind again a cheap pocket calculator in a contest of long division. Human versus computer chess matches ceased to be interesting a decade ago. Computers bring relentless focus, patience, processing speed, and memory: once their software becomes advanced enough to compete equally with humans, these features often ensure that they swiftly become much better than any human, with increasing computer power further widening the gap.

The AI could also make use of its unique, non-human architecture. If it existed as pure software, it could copy itself many times, training each copy at accelerated computer speed, and network those copies together (creating a kind of “super-committee” of the AI equivalents of, say, Edison, Bill Clinton, Plato, Einstein, Caesar, Spielberg, Ford, Steve Jobs, Buddha, Napoleon and other humans superlative in their respective skill-sets). It could continue copying itself without limit, creating millions or billions of copies, if it needed large numbers of brains to brute-force a solution to any particular problem.

Our society is setup to magnify the potential of such an entity, providing many routes to great power. If it could predict the stock market efficiently, it could accumulate vast wealth. If it was efficient at advice and social manipulation, it could create a personal assistant for every human being, manipulating the planet one human at a time. It could also replace almost every worker in the service sector. If it was efficient at running economies, it could offer its services doing so, gradually making us completely dependent on it. If it was skilled at hacking, it could take over most of the world’s computers and copy itself into them, using them to continue further hacking and computer takeover (and, incidentally, making itself almost impossible to destroy). The paths from AI intelligence to great AI power are many and varied, and it isn’t hard to imagine new ones.

Of course, simply because an AI could be extremely powerful, does not mean that it need be dangerous: its goals need not be negative. But most goals become dangerous when an AI becomes powerful. Consider a spam filter that became intelligent. Its task is to cut down on the number of spam messages that people receive. With great power, one solution to this requirement is to arrange to have all spammers killed. Or to shut down the internet. Or to have everyone killed. Or imagine an AI dedicated to increasing human happiness, as measured by the results of surveys, or by some biochemical marker in their brain. The most efficient way of doing this is to publicly execute anyone who marks themselves as unhappy on their survey, or to forcibly inject everyone with that biochemical marker.

This is a general feature of AI motivations: goals that seem safe for a weak or controlled AI, can lead to extremely pathological behaviour if the AI becomes powerful. As the AI gains in power, it becomes more and more important that its goals be fully compatible with human flourishing, or the AI could enact a pathological solution rather than one that we intended. Humans don’t expect this kind of behaviour, because our goals include a lot of implicit information, and we take “filter out the spam” to include “and don’t kill everyone in the world”, without having to articulate it. But the AI might be an extremely alien mind: we cannot anthropomorphise it, or expect it to interpret things the way we would. We have to articulate all the implicit limitations. Which may mean coming up with a solution to, say, human value and flourishing – a task philosophers have been failing at for millennia – and cast it unambiguously and without error into computer code.

Note that the AI may have a perfect understanding that when we programmed in “filter out the spam”, we implicitly meant “don’t kill everyone in the world”. But the AI has no motivation to go along with the spirit of the law: its goals are the letter only, the bit we actually programmed into it. Another worrying feature is that the AI would be motivated to hide its pathological tendencies as long as it is weak, and assure us that all was well, through anything it says or does. This is because it will never be able to achieve its goals if it is turned off, so it must lie and play nice to get anywhere. Only when we can no longer control it, would it be willing to act openly on its true goals – we can nut hope these turn out safe.

It is not certain that AIs could become so powerful, nor is it certain that a powerful AI would become dangerous. Nevertheless, the probabilities of either are high enough that the risk cannot be dismissed. The main focus of AI research today is creating an AI; much more work needs to be done on creating it safely. Some are already working on this problem (such as the Future of Humanity Institute and the Machine Intelligence Research Institute), but a lot remains to be done, both at the design and at the policy level.

Comment author: MrMind 08 April 2014 09:29:20AM 1 point [-]

(creating a kind of “super-committee” of the AI equivalents of, say, Edison, Bill Clinton, Plato, Oprah, Einstein, Caesar, Bach, Ford, Steve Jobs, Goebbels, Buddha and other humans superlative superlative in their respective skill-sets)

I think you will want here to create a sort of positive affect spiral, so kill Goebbels and put, say, Patton or Churchill (I guess the orientation is to appeal to anglophone cultures, since outside of them Oprah is almost unkown).

But the AI might be an extremely alien mind:

I would change the charged word "alien" and put something similar, like "strange" or "different". It might trigger bad automatic associations with science fiction.

Comment author: Stuart_Armstrong 18 April 2014 10:23:20AM 0 points [-]

In the end, removed Goebells, Oprah and Bach, added Napoleon and Spielberg.

Comment author: DanielLC 17 April 2014 06:09:04PM 0 points [-]

What's the difference?

Comment author: Stuart_Armstrong 18 April 2014 08:51:30AM 0 points [-]

Siren worlds are optimised to be bad and hide this fact. Marketing worlds are optimised to appear good, and the badness is an indirect consequence of this.

Bostrom versus Transcendence

10 Stuart_Armstrong 18 April 2014 08:31AM
Comment author: simon 11 April 2014 03:10:01AM 4 points [-]

"ask for too much and you wind up with nothing" is a fine fairy tale moral. Does it actually hold in these particular circumstances?

Imagine that there's a landscape of possible words. There is a function (A) on this landscape, we don't know how to define it, but it is how much we truly would prefer a world if only we knew. Somewhere this function has a peak, the most ideal "eutopia". There is another function. This one we do define. It is intended to approximate the first function, but it does not do so perfectly. Our "acceptability criteria" is to require that this second function (B) has a value at least some threshold.

Now as we raise the acceptability criteria (threshold for function B), we might expect there to be two different regimes. In a first regime with low acceptability criteria, Function B is not that bad a proxy for function A, and raising the threshold increases the average true desirability of the worlds that meet it. In a second regime with high acceptability criteria, function B ceases to be effective as a proxy. Here we are asking for "too much". The peak of function B is at a different place than the peak of function A, and as we raise the threshold high enough we exclude the peak of A entirely. What we end up with is a world highly optimized for B and not so well optimized for A - a "marketing world".

So, we must conclude, like you and Stuart Armstrong, that asking for "too much" is bad and we'd better set a lower threshold. Case closed, right?


The problem is that the above line of reasoning provides no reason to believe that the "marketing world" at the peak of function B is any worse than a random world at any lower threshold. As we relax the threshold on B, we include more worlds that are better in terms of A but also more that are worse. There's no particular reason to believe, simply because the peak of B is at a different place than the peak of A, that the peak of B is at a valley of A. In fact, if B represents our best available estimate of A, it would seem that, even though the peak of B is predictably a marketing world, it's still our best bet at getting a good value of A. A random world at any lower threshold should have a lower expected value of A.

Comment author: Stuart_Armstrong 17 April 2014 11:34:47AM 0 points [-]

The problem is that the above line of reasoning provides no reason to believe that the "marketing world" at the peak of function B is any worse than a random world at any lower threshold.

True. Which is why I added arguments pointing that a marketing world will likely be bad. Even on your terms, a peak of B will probably involve a diversion of effort/energy that could have contributed to A, away from A. eg if A is apples and B is bananas, the world with the most bananas is likely to contain no apples at all.

Comment author: ThisSpaceAvailable 10 April 2014 08:57:41PM 2 points [-]

It might not be what they were aiming for, but maybe scientists should be more willing to embrace this sort of result. It's more glamorous to find a line of research that does work, but research projects that don't work are still useful, and should still be valued. I don't think it's good for science for scientists to be denigrated for choosing lines of research that end up not working.

Comment author: Stuart_Armstrong 17 April 2014 11:31:13AM 0 points [-]

I don't think it's good for science for scientists to be denigrated for choosing lines of research that end up not working.

No, they shouldn't. But they shouldn't be blase and relaxed about their projects working or not working either. They should never think "I will be used as an example of how not to do science" as being a positive thing to aim for...

Comment author: ESRogs 11 April 2014 01:02:30AM 0 points [-]

As I was reading I was wondering if there was a modern application that you were hinting at -- some specific case where you think we might be overconfident today. Do you see specific applications today, or is this just something you think we should keep in mind in general?

Comment author: Stuart_Armstrong 17 April 2014 11:28:43AM 2 points [-]

In general. When making predictions about AI, no matter how convincing they seem to us, we should remember all the wrong predictions that felt very convincing to past people for reasons that were very reasonable at the time.

Comment author: So8res 11 April 2014 04:10:32PM *  0 points [-]

Typo: "The AI could make use of it unique, non-human ..." -- should be "its unique, non-human"

Comment author: Stuart_Armstrong 17 April 2014 11:24:00AM 0 points [-]


Comment author: Strange7 14 April 2014 01:58:32AM 0 points [-]

4 and 6 are contradictory.

Comment author: Stuart_Armstrong 17 April 2014 11:23:43AM 0 points [-]

6 is before striking against humans, 4 is after.

Comment author: fractalcat 14 April 2014 10:50:01AM 0 points [-]

I'm not totally sure of your argument here; would you be able to clarify why satisficing is superior to a straight maximization given your hypothetical[0]?

Specifically, you argue correctly that human judgement is informed by numerous hidden variables over which we have no awareness, and thus a maximization process executed by us has the potential for error. You also argue that 'eutopian'/'good enough' worlds are likely to be more common than sirens. Given that, how is a judgement with error induced by hidden variables any worse than a judgement made using deliberate randomization (or selecting the first 'good enough' world, assuming no unstated special properties of our worldspace-traversal)? Satisficing might be more computationally efficient, but that doesn't seem to be the argument you're making.

[0] The ex-nihilo siren worlds rather than the designed ones; an evil AI presumably has knowledge of our decision process and can create perfectly-misaligned worlds.

Comment author: Stuart_Armstrong 17 April 2014 11:22:56AM 0 points [-]

Siren and Marketing worlds are rarer than eutopias, but rank higher in our maximisation scale. So picking a world among the "good enough" will likely be a eutopia, but picking the top ranked world will likely be a marketing world.

View more: Next