[Written for a general audience. You can probably skip the first section. Posted for feedback/comment before publication on The Roots of Progress. Decided not to publish as-is, although parts of this have been or may be used in other essays.]

Will AI kill us all?

That question is being debated seriously by many smart people at the moment. Following Charles Mann, I’ll call them the wizards and the prophets: the prophets think that the risk from AI is so great that we should actively slow or stop progress on it; the wizards disagree.

Why even discuss this?

(If you are already very interested in this topic, you can skip this section.)

Some of my readers will be relieved that I am finally addressing AI risk. Others will think that an AI apocalypse is classic hysterical pessimist doomerism, and they will wonder why I am even dignifying it with a response, let alone taking it seriously.

A few reasons:

It’s important to take safety seriously

Safety is a value. New technologies really do create risk, and the more powerful we get, the bigger the risk. Making technology safer is a part of progress, and we should celebrate it. Doomer pessimism is generally wrong, but so is complacent optimism. We should be prescriptive, not descriptive optimists, embracing solutionism over complacency.

We shouldn’t dismiss arguments based on vibes

Or mood affiliation, or who is making the argument, or what kind of philosophy they seem to be coming from. Our goal is to see the truth clearly. And the fact that doomer arguments always been wrong doesn’t mean that this one is.

The AI prophets are not typical doomers

They are generally pro-technology, pro-human, and not fatalistic. Nor are they prone to authoritarianism; many lean libertarian. And their arguments are intelligent and thoroughly thought-out.

Many of the arguments against them are bad

Many people (not mentioned in this post) are not thinking clearly and are being fairly sloppy.

So I want to address this.

The argument

I boil it down to three main claims:

AI will become a superintelligent agent

It will be far smarter than any human being, quantitatively if not qualitatively. And some forms of the AI will have goal-directed behavior.

This does not require computers to be conscious (merely that they be able to do things that right now only conscious beings can do). It does not require them to have a qualitatively different form of “intelligence”: it could be enough for them to be as smart as a brilliant human, able to read everything ever written and have perfect recall of it, able to think 1000x faster, able to fork into teams that work on things simultaneously, etc.

The AI’s goals will not be aligned with ours

This is the principal-agent problem again. Whatever it is aiming at will not be exactly what we want. We won’t be able to give it perfect instructions. We will not be able to train it to obey the law. We won’t even be able to train it to follow basic human morality, like “don’t kill everyone.”

This does not require it to have free will to choose its goals, or otherwise to depart from following the training we have given it. Like a genie or a monkey’s paw, it might do exactly what we ask for, in a way that is not at all what we wanted—following the letter of our instructions, but destroying the spirit.

All our prevention and countermeasures will fail

If we test AI in a box before letting it out into the real world, our tests will miss crucial problems. If we try to keep it in a box forever, it will talk its way out (and by the way, we’re not even trying to do that). If we try to limit the AI’s power, it will evade those limitations. If we try to turn it off, it will stop us. If we try to use some AIs as police to watch the other AIs, they will instead collude with each other and conspire against us. In fact, it might anticipate all of the above and conclude that the easiest path is just to launch a sneak attack on humanity and kill us all to get us out of the way.

And whatever happens might happen so fast that we don’t get a chance to learn from failure. There will be no Hindenberg or Tacoma Narrows Bridge or Chernobyl as a cautionary example. There will be no warning shot, no failed robot uprising. The very first time AI takes action against us, it will wipe us all out.

Analogies

In “Four lenses on AI risks”, I gave the analogy that AI might be like expansionary Western empires when they clashed with other civilizations, or like humans when they arrived on the evolutionary scene, wiping out the Neanderthals and hunting many megafauna to extinction.

A related argument is that if you would be worried about an advanced alien civilization coming to Earth, you should worry about AI.

What’s different this time

People have always been worried that new technologies would cause catastrophe. But so far, technology has done far more good than harm overall. What might be different this time?

Related, why worry about AI instead of an asteroid impact, an antibiotic-resistant superbug, etc.?

The crux is the power of intelligence. Humans have been able so far to overcome every challenge because of the power of our intelligence. We can beat natural disasters: drought and famine, storm and flood. We can beat wild animals. We can beat bacteria and viruses. We can make cars, planes, drugs, and X-rays safe. Nature is no match for us because intelligence trumps everything. David Deutsch says that “anything not forbidden by the laws of nature is achievable, given the right knowledge.”

If AI goes rogue, we are for the first time up against an intelligent adversary. We’re not mastering indifferent nature; we’re potentially up against something that has a world-model, that can create and execute plans.

Arguably, the more optimistic you are about the ability of humans to overcome any challenge, the more worried you should be about any non-human thing gaining that same ability.

The crux is epistemic

Why do smart people disagree so much on this?

Eliezer is certain we are doomed. Zvi thinks it’s very likely. Scott Alexander gives it a 33% chance (which means we still have a 2/3 chance to survive!) On the other hand, Scott Aaronson implies that his probability is under 2%; Tyler Cowen says that we just can’t know, Pinker is dismissive of all the arguments.

I think the deepest crux here is epistemological: how well do we understand this issue, how much can we say about it, and what can we predict?

The prophets think that, based on the nature of intelligence, the entire argument above is obviously correct. Most of the argument can be boiled down to a simple syllogism: the superior intelligence is always in control; as soon as AI is more intelligent than we are, we are no longer in control.

The wizards think that we are more in a realm of Knightian uncertainty. There are too many unknown unknowns. We can’t make any confident projections of what will happen. Any attempt to do so is highly speculative. If we were to give equal weight to all hypotheses with equal evidence, there would be a epistemically unmanageable combinatorial explosion of scenarios to consider.

There is then a further disagreement about how to talk about such scenarios. Adherents of Bayesian epistemology want to put a probability on everything, no matter how far removed from evidence. Neo-Popperians like David Deutsch think that even suggesting such probabilities is irrational, that attempting inferences beyond the “reach” of our best explanations is unwarranted—appropriately, the term Popper used for this was “prophecy.”

Eliezer thinks that this is like orbital mechanics: we see an asteroid way out in the distance, we calculate its trajectory, we know from physics that it is going to destroy the Earth.

Why I’m skeptical of the prophecy

Orbital mechanics is very simple and well-understood. The situation with AI is complex and poorly understood.

What could a superintelligence really do? The prophets’ answer seems to be “pretty much anything.” Any sci-fi scenario you can imagine, like “diamondoid bacteria that infect all humans, then simultaneously release botulinum toxin.” In this view, as intelligence increases without limit, it approaches omnipotence. But this is not at all obvious to me.

The same view is behind the argument that all our prevention and countermeasures will fail: the AI will outsmart you, manipulate you, outmaneuver you, etc. As Scott Aaronson points out, this is a “fully general counterargument” to anything that might work.

When we think about Western empires or alien invasions, what makes one side superior is not raw intelligence, but the results of that intelligence compounded over time, in the form of science, technology, infrastructure, and wealth. Similarly, an unaided human is no match for most animals. AI, no matter how intelligent, will not start out with a compounding advantage.

Similarly, will we really have no ability to learn from mistakes? One of the prophets’ worries is “fast takeoff”, the idea that AI progress could go from ordinary to godlike literally overnight (perhaps through “recursive self-improvement”). But in reality, we seem to be seeing a “slow takeoff,” as some form of AI has arrived and we actually have time to talk and worry about it (even though Eliezer claims that fast takeoff has not yet been invalidated).

If some rogue AI were to plot against us, would it actually succeed on the first try? Even genius humans generally don’t succeed on the first try of everything they do. The prophets think that AI can deduce its way to victory—the same way they think they can deduce their way to predicting such outcomes.

Proceed, with caution

We always have to act, even in the face of uncertainty—even Knightian uncertainty.

We also have to remember that the potential advantages of AI are as great as its risks. If it is as powerful as its worst critics fear, then it is also powerful enough to give us abundant clean energy, cheap manufacturing and construction, fast and safe transportation, and the cure for all disease. Remember that no matter what, we’re all going to die eventually, until and unless we cure aging itself.

If we did see an alien fleet approaching us, would we try to hide? If they weren’t even on course for us, but were going to pass us by, would we stay silent, or call out to them? Personally, I would want to meet them and to learn from them. And yes, without some evidence of hostile intent on their part, I would risk our civilization to not pass up that defining moment.

Scott Aaronson defines someone’s “Faust parameter” as “the maximum probability they’d accept of an existential catastrophe in order that we should all learn the answers to all of humanity’s greatest questions,” adding “I confess that my Faust parameter might be as high as 0.02.” I sympathize.

None of the above means “damn the torpedoes, full speed ahead.” Testing and AI safety work are all valuable. It is good to occasionally hold an Asilomar conference. It’s good to think through the safety implications of new developments before even working on them, as Kevin Esvelt did for the gene drive. We can do “reform” vs. “orthodox” AI safety. (And note that OpenAI spent several months testing GPT-4 before its release.)

So, proceed with caution. But proceed.

New Comment
11 comments, sorted by Click to highlight new comments since:

If some rogue AI were to plot against us, would it actually succeed on the first try? Even genius humans generally don’t succeed on the first try of everything they do. The prophets think that AI can deduce its way to victory—the same way they think they can deduce their way to predicting such outcomes.

I think a weaker thing--I think that if a rogue AI plots against us and fails, this will not spur the relevant authorities to call for a general halt. Instead that bug will be 'patched', and AI development will continue until we create one that does successfully plot against us.

[Edit] Eliezer talks sometimes about the "Law of Continued Failure"; if your system was sane enough to respond to a warning shot with an adequate response, it was probably sane enough to do the right thing before the warning shot. I think that the current uncertainty about whether or not AI would go rogue will continue even after an AI system goes rogue, in much the same way uncertainty about whether or not gain-of-function research is worth it continues even after the COVID pandemic.

Personally, I want to get to the glorious transhumanist future as soon as possible as much as anybody, but if there's a chance that AI kills us all instead, that's good enough for me to say we should be hitting pause on it. 

I don't wanna pull the meme phrase on people here, but if it's ever going to be said, now's the time: "Won't somebody please think of the children?"

Any chance? A one in a million chance? 1e-12? At some point you should take the chance. What is your Faust parameter?

It depends at what rate the chance can be decreased. If it takes 50 years to shrink it from 1% to 0.1%, then with all the people that would die in that time, I'd probably be willing to risk it. 

As of right now, even the most optimistic experts I've seen put p(doom) at much higher than 1% - far into the range where I vote to hit pause.

Remember that no matter what, we’re all going to die eventually, until and unless we cure aging itself.

 

Not necessarily, there are other options. For example cryonics.

Which I think is important. If our only groups of options were:

1) Release AGI which risks killing all humans with high probability or

2) Don't do until we're confident it's pretty safe it and each human dies before they turn 200.

I can see how some people might think that option 2) guarantees universe looses all value for them personally and choose 1) even if it's very risky.

However we have also have the following option:

3) Don't release AGI until we're confident it's pretty safe. But do our best to preserve everyone so that they can be revived when we do.

I think this makes waiting much more palatable - even those who care only about some humans currently alive are better off waiting with releasing AGI it's at least as likely to succeed as cryonics.

(also working directly on solving aging while waiting on AGI might have better payoff profile than rushing AGI anyways)

But we have no idea if our current cryonics works. It's not clear to me whether it's easier to solve that or to solve aging.

I think it should be much easier to get good estimate of whether cryonics would work. For example:

  • if we could simulate individual c. elegans then we know pretty well what kind of info we need to preserve
  • then we can check if we're preserving it (even if current methods for extracting all relevant info won't work for whole human brain because they're way to slow)

And it's much less risky path than doing AGI quickly. So I think it's a mitigation it'd be good to work on, so that waiting to make AI safer is more palatable.

What could a superintelligence really do? The prophets’ answer seems to be “pretty much anything.” Any sci-fi scenario you can imagine, like “diamondoid bacteria that infect all humans, then simultaneously release botulinum toxin.” In this view, as intelligence increases without limit, it approaches omnipotence. But this is not at all obvious to me.

 

I can easily imagine various ways a bright human, who can spin up subprocesses and micromanage on a large scale to take over the world. For example, I imagine it should be fairly easy for an AI to straight-out talk the majority of people into playing along with whatever it has in mind. 

More generally, I find the post quite good and thoughtfully written, but I find the arguments brought in "Why I'm skeptical of the prophecy" weak. 

The same view is behind the argument that all our prevention and countermeasures will fail: the AI will outsmart you, manipulate you, outmaneuver you, etc. As Scott Aaronson points out, this is a “fully general counterargument” to anything that might work.

Pretty much, but that doesn't make it invalid: If someone who's smarter (more precisely, has a broader range of options and knows how to use them) than you is coming for you, you're in trouble. That's not saying that e.g. AI-boxing wouldn't be strictly better than not doing it, but it doesn't keep you safe if the AI is sufficiently smart and has any way of interacting with the world.

When we think about Western empires or alien invasions, what makes one side superior is not raw intelligence, but the results of that intelligence compounded over time, in the form of science, technology, infrastructure, and wealth.

This seems to be un-founded. For one, I'm not sure we ever had a situation where two empires had a substantial difference in pure intellectual capability and everything else was kept equal. Additionally, if you look at the Spanish conquests of the Americas, they did a scary amount of conquering simply by outsmarting the locals. You could argue that they had prior real-world experience, but I don't think that's such an easy argument to sustain. 

 AI, no matter how intelligent, will not start out with a compounding advantage. 

If it has access to the internet, you can assume it starts with all of human knowldege plus the ability to quickly process, organize and re-combine it, which already puts it beyond the capabilities of humanity. 

Most of the argument can be boiled down to a simple syllogism: the superior intelligence is always in control; as soon as AI is more intelligent than we are, we are no longer in control.

Seems right to me.  And it's a helpful distillation. 

When we think about Western empires or alien invasions, what makes one side superior is not raw intelligence, but the results of that intelligence compounded over time, in the form of science, technology, infrastructure, and wealth. Similarly, an unaided human is no match for most animals. AI, no matter how intelligent, will not start out with a compounding advantage.

Similarly, will we really have no ability to learn from mistakes? One of the prophets’ worries is “fast takeoff”, the idea that AI progress could go from ordinary to godlike literally overnight (perhaps through “recursive self-improvement”). But in reality, we seem to be seeing a “slow takeoff,” as some form of AI has arrived and we actually have time to talk and worry about it (even though Eliezer claims that fast takeoff has not yet been invalidated).

If some rogue AI were to plot against us, would it actually succeed on the first try? Even genius humans generally don’t succeed on the first try of everything they do. The prophets think that AI can deduce its way to victory—the same way they think they can deduce their way to predicting such outcomes.

I'm not seeing how this is conceptually distinct from the existing takeoff concept. 

  • Aren't science, technology, infrastructure, and wealth merely intelligence + time (+ matter)?
  • And compounding, too, is just intelligence + time, no?
  • And whether the rogue AI succeeds on its first attempt at a takeover just depends on its intelligence level at that time is, right? Like a professional chess player will completely dominate me in a chess match on their first try because our gap in chess intelligence is super large. But that pro chess player competing against their adjacently-ranked competitor won't likely result in such a dominating outcome, right? 

I'm failing to see how you've changed the terms of the argument?

Is it just that you think slow takeoff is more likely?

Chess is a simple game and a professional chess player has played it many, many times. The first time a professional plays you is not their “first try” at chess.

Acting in the (messy, complicated) real world is different.