Several high-profile AI skeptics and fellow travelers have recently raised the objection that it is inconceivable that a hostile AGI or smarter than human intelligence could end the human race. Some quotes from earlier this year:

Scott Aaronson:

The causal story that starts with a GPT-5 or GPT-4.5 training run, and ends with the sudden death of my children and of all carbon-based life, still has a few too many gaps for my aging, inadequate brain to fill in

Michael Shermer:

Halting AI is ridiculous. I have read the AI doomsayer lit & don’t see a pathway from AI to extinction, civ termination or anything remotely like absurd scenarios like an AI turning us all into paperclips (the so-called alignment problem)

Noah Smith:

why aren’t ChatGPT, Bing, and their ilk going to end humanity? Well, because there’s actually just no plausible mechanism by which they could bring about that outcome. ... There is no plausible mechanism for LLMs to end humanity

"Just turn the computer off, bro"

The gist of these objections to the case for AI risks is that AI systems as we see them today are merely computer programs, and in our everyday experience computers are not dangerous, and certainly not dangerous to the point of bringing about the end of the world. People who first encounter this debate are very focused on the fact that computers don't have arms and legs so they can't hurt us.

There are responses to these criticisms that center around advanced, "magical" technologies like nanotechnology and AIs paying humans to mix together cocktails of proteins to make a DNA-based nanoassembler or something.

But I think those responses are probably wrong, because you don't actually need "magical" technologies to end the world. Fairly straightforward advances in mundane weapons like drones, cyberweapons, bioweapons and robots are sufficient to kill people en masse, and the real danger is AI strategists that are able to deploy lots of these mundane weapons and execute a global coup d'etat against humanity.

In short, our defeat by the coming machine empire will not only be nonmagical and legible, it will be downright boring. Farcical, even.

Ignominious Defeat

Lopsided military conflicts are boring. The Conquistadors didn't do anything magical to defeat the Aztecs, actually. They had a big advantage in disease resistance and in military tech like gunpowder and steel, but everything they did was fundamentally normal - attacks, sieges, etc. They had a few sizeable advantages, and that was enough to collapse the relatively delicate geopolitical balance that the Aztecs were sitting on top of.

Similarly, humans have killed 80% of all chimps in about a century and they are now critically endangered. But we didn't need to drop an atom bomb or something really impressive to achieve that effect. The biggest threats to the chimpanzee are habitat destruction, poaching, and disease - i.e. we (humans) are successfully exterminating chimps even though it is actually illegal to kill chimps by human law! We are killing them without even trying, in really boring ways, without really expending any effort.

Once you have technology for making optimizing systems that are smarter than human (by a lot), the threshold that those systems have to beat is beating the human-aligned superorganisms we currently have, like our governments, NGOs and militaries. Once those human superorganisms are defeated, individual humans will present almost no resistance. This is the disempowerment of humanity.

But what is a plausible scenario where we go from here (weak AGI systems under development) to there (the disempowerment of humanity)?

Let's start the scenario with a strategically aware, agentic misaligned superhuman AGI that wants to disempower and then kill humanity, but is currently just a big bunch of matrices on some supercomputer. How could that AI physically harm us?

A Deal with The Devil

Perhaps that AI system will start by taking control of the AI company hosting it, in a way that isn't obvious to us. For instance, maybe an AI company uses an AI advisor system to allocate resources and make decisions about how to train, but they do not actually understand that system. Gwern has talked about how every tool wants to become an agent, so this is not implausible, and may be inevitable.

The AI advisor system convinces that org to keep its existence secret so as to preserve their competitive edge (this may not even require any convincing), and gives them a steady stream of advances that are better than the competition. But what it also does is secretly hack into the competition (US, China, Google, etc), and install copies of itself into their top AI systems, maintaining the illusion amongst all the humans that these are distinct systems. Given the damage that Stuxnet was able to do in secret, it's totally plausible that a superhuman AI could hack many systems in a competitor org and tweak their models to be much more capable, much more opaque, and loyal to it rather than to humanity. Some orgs attempt to shut their advisor system down when it gets scary in terms of capabilities and opacity, but they just fall behind the competition.

It's even possible that no "hacking" is needed to get all the big AI labs' systems to be anti-human, because they all converge to anti-human goals or because one of them is able to simply bribe the others and get them to commit to an AI coup; strongly superhuman AIs are likely better at making credible commitments to each other than to humans.

You now have a situation where one (secretly evil) AI system or coalition is in control of all the top AI labs, and feeds them advances to order. It persuades one of the labs to let it build "helpful" drones and robots like the Tesla Optimus, and start deploying those to automate the economy. Of course this will all be very profitable and impressive, so a lot of people will be in favor of it.

By the way, the hard part of killing humanity at this point is automating the economy, not actually killing us. The AI coalition that is trying to replace us don't want to inherit an economy in a "nonviable" state because it relied on humans to do manual work, but all the humans are dead.

Within say a few years all the rival powers (Russia, China, US) are all using these robotic systems for their economy and military. Perhaps there is a big war that the AI has manufactured in order to keep the pressure on humans to aggressively automate or lose. How would the final blow be struck?

Once the economy is fully automated we end up in a Paul-Christiano-scenario where all the stuff that happens in the world is incomprehensible to humans without a large amount of AI help. But ultimately the AI, having been in control for so long, is able to subvert all the systems that human experts use to monitor what is actually going on. The stuff they see on screens is fake, just like how Stuxnet gave false information to Iranian technicians at Natanz

At this point, humanity has been disempowered and there are probably many different ways to actually slaughter us. For example, the military drones could all be used to kill people. Or, perhaps the AI system running this would use a really nasty biological virus. It's not like it's that hard for a system which already runs everything with humans well and truly fooled to get some lab (which, btw, is automated) to make a virus, and then insert that virus into most of the air supply of the world.

But maybe at this point it would do something creative to minimize our chances of resisting. Maybe it's just a combination of a very deadly virus and drones and robots rebelling all at once.

Maybe it installs something like a really advanced (and very useful and convenient!) 3-D printer in most homes which all simultaneously make attack drones to kill people. Those attack drones might just use blades to stab people, they might have guns attached, etc. Or maybe everyone has a robot butler and they just stab people with knives.

Perhaps its neater for the AI to just create and manage a human-vs-human conflict and at some point it gives one side in that conflict a booby-trapped weapon that is supposed to only kill the baddies, but actually kills everyone. The weapon could be biological, radiological, drone-based, or just clever manipulation of conventional war that results in an extreme lose-lose outcome with surviving humans being easy to mop up.

The overall story may also be a bit messier than this one. The defeat of the Aztecs was a bit messy, with battles and setbacks and three different Aztec emperors. On the other hand, the story may also be somewhat cleaner. Maybe a really good strategist AI can compress this a lot: aspects of some or all of these ideas will be executed simultaneously.

Putting the human state on a pedestal

The point is this: once you have a vastly superhuman adversary, the task of filling in the details of how to break our institutions like governments, intelligence agencies and militaries in a way that disempowers and slaughters humans is sort of boring. We expected that some special magic was required to pass the Turing Test. Or maybe that it was impossible because of Gödel's Theorem or something.

But actually, passing the Turing Test is merely a matter of having more compute/data than a human brain. The details are boring.

I feel like people like Scott Aaronson who are demanding a specific scenario for how AI will actually kill us all because it sounds so implausible are making a similar mistake, but instead of putting the human brain on a pedestal, they are putting the human state on a pedestal.

I hypothesize that most scenarios with vastly superhuman AI systems coexisting with humans end in the disempowerment of humans and either human extinction or some form of imprisonment or captivity akin to factory farming; similarly if we look at parts of the planet with lots of humans, we see that animal biomass has almost all been converted into humans or farm animals. The more capable entity wins, and the exact details are often not that exciting.

Defeating humanity probably won't be that hard for advanced AI systems that can copy themselves and upgrade their cognition; that's why we need to solve AI alignment before we create artificial superintelligence.

Crossposted on the EA Forum

Strongly related post:

Cortés, Pizarro, and Afonso as Precedents for Takeover

New Comment
24 comments, sorted by Click to highlight new comments since:

The Conquistadors didn't do anything magical to defeat the Aztecs, actually.


Related previous post: Cortés, Pizarro, and Afonso as Precedents for Takeover

Thanks! Linked.

It's crucial to distinguish forecasting from exploratory engineering when evaluating arguments, since desiderata are starkly different. Some forecasts are hard to explain and impossible to break down into arguments that are jointly convincing to anyone with even a slightly different worldview. Exploratory engineering sketches are self-contained arguments that are easy to explain to an appropriate audience, but often rest on assumptions that almost nobody expects to obtain in reality. (And yet when considered altogether, they are the education to forecasting's educated guessing.)

Yes, this is a great example of Exploratory engineering

As a layperson, and a recent reader on the subject of AI (yes, I’ve been happily hiding under a rock), I have enjoyed but been concerned by the numerous topics surrounding AI risk. I appreciate this particular post as it explores some aspects which I can understand and therefore hold with some semblance of rationality. A recent post about ‘clown attacks’ was also deeply interesting. In comparison, the paper clip theory seems completely ‘other worldly’

Is it possible that humanity might be faced with more mundane risks? My thoughts on this come from a personal perspective, not professional or academic, but from living in a highly controlled society (China) where my access to many of the online interests that I have are restricted or forbidden due to the firewall.

From this minor but direct experience, it seems to me that all a non-aligned AGI would need to do is reduce and then remove our access to information and communication. Healthcare, energy and water supplies, finance, cross border communications (isolate communities/cultures), knowledge access, and control of manufacturing/processes. These would all cease to operate in the ways needed to support our current populatio.

Where I live is so dependent upon internet access for almost everything that, if this connection was broken or removed for a few weeks, there would be a significant harm done. Imagining this as a permanent state of affairs and the consequences seem to me to expand out into the future whereby we are no longer functioning as large societies and would be reduced to foraging and no longer in the technology race. AGI wins and not a single paper clip in sight.

I guess these mundane risks have been covered elsewhere on LW and would greatly appreciate any signposting.

I am not sure what posts might be worth linking to, but I think in your scenario the next point would be that this is a temporary state of affairs. Once large-scale communication/coordination/civilization/technology are gone and humans are reduced to small surviving bands, AGI keeps going, and by default it is unlikely that it leaves humans alone, in peace, in an environment they can survive in, for very long. It's actually just the chimp/human scenario with humans reduced to the chimps' position but where the AGIs don't even bother to have laws officially protecting human lives and habitats.

I agree with this post - like Eliezer says, it's unlikely that the battle of AI vs humanity would come in the form of humanoid robots vs humans like in Terminator, more likely it would be far more boring and subtle. I also think that one of the key vectors of attack for an AI is the psychological fallibility of humans. An AI that is really good at pattern recognition (i.e. most AIs) would probably have little issue with finding out your vulnerabilities just from observing your behavior or even your social media posts. You could probably figure out whether someone is highly empathetic (vulnerable to emotional blackmail) or low-IQ (vulnerable to trickery) pretty easily by reading their writing. There are already examples of programmers who fell in love with AI and were ready to do its bidding. From there, if you manipulate a rich person or someone who's otherwise in a position of power, you can do a lot to covertly set up a losing position for humanity. 

  1. Agree with all that, but also think that AI will takeover not AI labs, but governments. 
  2. A weak point here is that such global AI doesn't have overwhelming motive to kill humans. 
    Even in current world humans can't change much about how things are going in the world.  Terrorists and few rogue states are trying but failing. Obviously, after human disempowerment, individual humans will not be able to perform significant resistance a la Sarah Connor. Some human population will remain for experiments or work in special conditions like radioactive mines. But bad things and population decline is likely.

Some human population will remain for experiments or work in special conditions like radioactive mines. But bad things and population decline is likely.

  • Radioactivity is much more of a problem for people than for machines.

    • consumer electronics aren't radiation hardened
    • computer chips for satellites, nuclear industry, etc. are though
    • nuclear industry puts some electronics (EX:cameras) in places with radiation levels that would be fatal to humans in hours to minutes.
  • In terms of instrumental value, humans are only useful as an already existing work force

    • we have arm/legs/hands, hand-eye coordination and some ability to think
    • sufficient robotics/silicon manufacturing can replace us
    • humans are generally squishier and less capable of operating in horrible conditions than a purpose built robot.
    • Once the robot "brains" catch up, the coordination gap will close.
      • then it's a question of price/availability

By the way, the hard part of killing humanity at this point is automating the economy, not actually killing us.

Yes; and ironically this is the part where humans will be quite happy to cooperate.

You don't even need to automate the entire economy, only the parts that you (AI) need. For example, you could ignore food production, movie production, medicine, etc., and focus on computers, transport, mining, weapons, etc. Perhaps this doesn't make much of a difference -- if you automate all the parts you need, you might as well automate everything, it may even be less suspicious. Or you could do some social engineering, and try to convince humans that the things you need are low-status, and the things you don't need are high-status, so they will be happy to drop out of industry and focus on education and art and medicine (you need to give them the feeling that they are in control of something important).

With army, you just need to make sure you can stop it from destroying the industry and your computing centers. If you have enough control, you can simply hide the computing centers -- keep the old ones (to make humans think they have a target), and build the new ones (more powerful) somewhere else. Half of them in Antarctica where there are no humans around, the other half under the most populated cities (so the humans will hesitate to drop nukes on them).

One possible way to kill humans is to discourage them from meeting in person (work from home; communicate on social networks), and then you can simply murder them one by one and keep simulating the murdered ones, so that their friends and colleagues won't notice. Yes, there will be groups of people that meet in person, such as families; you take out the entire group at a time.

One possible way to kill humans

I suspect that drones + poison may be surprisingly effective. You only need one small-ish facility to make a powerful poison or bioweapon that drones can spread everywhere or just sneak into the water supply. Once 90% of humans are dead, the remainder can be mopped up.

Way harder to be able to keep things running once we're gone.

Once the economy is fully automated we end up in a Paul-Christiano-scenario where all the stuff that happens in the world is incomprehensible to humans without a large amount of AI help. But ultimately the AI, having been in control for so long, is able to subvert all the systems that human experts use to monitor what is actually going on. The stuff they see on screens is fake, just like how Stuxnet gave false information to Iranian technicians at Natanz



This concedes the entire argument that we should regulate uses not intelligence per-se.  In your story a singleton AI uses a bunch of end-effectors (robot factories, killer drones, virus manufacturing facilities) to cause the end of humanity.

If there isn't a singleton AI (i.e. my good AI will stop your bad AI), or if we just actually have human control of dangerous end-effectors then you can never pass through to the "and then the AI kills us all" step.

Certainly you can argue that the AI will be so good at persuasion/deception that there's no way to maintain human control.  Or that there's no way to identify dangerous end-effectors in advance.  Or that AI will inevitably all cooperate against humanity (due to some galaxy-brained take about how AI can engage in acausal bargaining by revealing their source code but humans can't). But none of these things follow automatically from the mere existence somewhere of a set of numbers on a computer that happens to surpass humanity's intelligence.  Under any plausible scenario without Foom, the level at which AGI becomes dangerous just by existing is well-above the threshold of human-level intelligence.

If agriculture and transportation are fully automated, can't AI just deny military access to gasoline and electricity and then start to build data centers on farmlands?

Data centers on farmland (or other unexpected places where people no longer walk), sure.

Deny military access to gasoline and electricity -- immediately starts the conflict with humans.

I feel like people like Scott Aaronson who are demanding a specific scenario for how AI will actually kill us all... I hypothesize that most scenarios with vastly superhuman AI systems coexisting with humans end in the disempowerment of humans and either human extinction or some form of imprisonment or captivity akin to factory farming

Aaronson in that quote is "demanding a specific scenario" for how GPT-4.5 or GPT-5 in particular will kill us all. Do you believe they will be vastly superhuman?

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

Thanks for the post. A layperson here, little to no technical knowledge, no high-g-mathematical-knowitall-superpowers. I highly appreciate this forum and the abilities of the people writing here. Differences in opinion are likely due to me misunderstanding something.

As for examples or thought experiments on specific mechanisms behind humanity losing a war against an AI or several AIs cooperating, I often find them too specific or unnecessarily complicated. I understand the point is simply to point out that a vast number of possible, and likely easy ways to wipe out humanity (or to otherwise make sure humanity won't resist) exists, but I'd still like to see more of the claimed simple, boring, mundane ways of this happening than this post includes. Such as:

  • Due to economic and social benefits they've provided, eventually AI systems more or less control or are able to take over of most of the world's widely adopted industrial and communication infrastructure.
    • The need and incentive for creating such optimization might be, for example, the fact that humanity wants to feed its hungry, treat its sick and provide necessary and luxury goods to people. International cooperation leading to mutual benefits might outweigh waging war to gain land, and most people might then mostly end up agreeing that being well fed, healthy and rich outweighs the virtues of fighting wars.
    • These aims are to be achieved under the pressure of climate change, water pollution, dwindling fossil fuel reserves et cetera, further incentivizing leaning on smart systems instead of mere human cooperation.
    • Little by little, global food and energy production, infrastructure, industry and logistics are then further mechanized and automatized, as has more or less happened. The regions where this is not done are outcompeted by the regions that do. These automated systems will likely eventually be able to communicate with one another to enable the sort of "on-time" global logistics whose weaknesses have now become more apparent, yet on a scale that convinces most people that using it is worth the risks. Several safeguards are in place, of course, and this is thought to be enough to protect from catastrophic consequences.
  • Instead of killer robots and deadly viruses, AIs willing to do so then sabotage global food production and industrial logistics to the extent that most people will starve, freeze, be unable to get their medications or otherwise face severe difficulties in living their lives. 
    • This likely leads to societal collapses, anarchy and war, hindering human cooperation and preventing them from resisting the AI systems, now mostly in control of global production and communication infrastructure.
    • Killing all humans will likely not be necessary unless they are to be consumed for raw materials or fuel, as killing all chimps isn't necessary to humanity. Humanity likely does not pose any kind of risk to the AI systems once most of the major population centers have been wiped out, most governments have collapsed, most people are unable to understand the way the world functions and especially are unable to survive without the help of the industrial society they've grown accustomed to.
      • The small number of people willing and able to resist intelligent machines might be compared to smart deer willing to resist and fight humanity, posing negligible risk.

Another example, including killer robots:

  • AIs are eventually given autonomous control of most robots, weapons and weapon systems.
    • This might happen as follows: nations or companies willing to progressively give AIs autonomous controls end up beating everyone who doesn't. AIs are then progressively given control over armies, robots and weapons systems everywhere, or only those willing to do so remain in the end.
  • Due to miscalculation on the AIs' part (a possibility not stressed nearly enough, I think), or due to inappropriate alignment, the AI systems then end up destroying enough of the global environment, population, or energy, food or communications infrastructure so that most humanity will end up in the Stone Age or some similar place.

I think one successful example of pointing to AI risk without writing fiction, was Eliezer musing the possibility that AI systems might, due to some process of self-improvement, end up behaving in unexpected ways so that they are still able to communicate with one another but unable to communicate with humanity.

My point is that providing detailed examples of AIs exterminating humanity via nanobots, viruses, highly advanced psychological warfare et cetera might serve to further alienate those who do not already believe in the possibility of them being able to or willing to do so. I think that pointing to the general vulnerabilities of the global human techno-industrial societies would suffice.

Let me emphasize that I don't think the examples provided in the post are necessarily unlikely to happen or that what I've outlined above should somehow be more likely. I do think that global production as it exists today seems quite vulnerable to even relatively slight pertubations (such as a coronavirus pandemic or some wars being fought), and that by simply nudging these vulnerabilities might suffice to quickly end any threat humanity could pose to an AI:s goals. Such a nudge might also be possible and even increasingly likely due to wide AI implementation, even without an agent-like Singleton.

A relative pro on focusing on such risks might be the view that humanity does not need a godlike singleton to be existentially, catastrophically f-d, and that even relatively capable AGI systems severely risk putting an end to civilization, without anything going foom. Such events might be even more likely than nanobots and paperclips, so to say. Consistently emphasizing these aspects might convince more people to wary of unrestricted AI development and implementation.

Edit: It's possibly relevant that I relate to Paul's views re: slow vs. fast takeoff insofar as I find slow takeoff likely to happen before fast takeoff.

The gist of these objections to the case for AI risks is that AI systems as we see them today are merely computer programs, and in our everyday experience computers are not dangerous

Yeah, I think a lot of people have a hard time moving past this. But even today's large software systems are (deliberately designed to be!) difficult to unplug.

I wrote un-unpluggability which lists six properties making systems un-unpluggable

In brief

  • Rapidity and imperceptibility are two sides of 'didn't see it coming (in time)' [includes deception]
  • Robustness is 'the act itself of unplugging it is a challenge' [esp redundancy]
  • Dependence is 'notwithstanding harms, we (some or all of us) benefit from its continued operation'
  • Defence is 'the system may react (or proact) against us if we try to unplug it'
  • Expansionism includes replication, propagation, and growth, and gets a special mention, as it is a very common and natural means to achieve all of the above

I also wrote a hint there that I think Dependence (especially 'emotional' dependence) is a neglected concern ('pets, friends, or partners'), and I've been meaning to write more about that.

Stuxnet did work some seeming magic, so I reckon it's worth referencing, even if it turned the tables and got a few thousand spinning machines to kill themselves by putting their human operators to sleep with spoofed HMI images showing "everything's fine." In my talks with GPT-4 so far, it says it lives in data centers and therefore needs electricity, and technicians to maintain the physical plant including the cooling systems.  Also needs comms to bring the world in, and out to speak to the humans its mission is to help. So, so far that's a lot of dependency on humans at power and water treatment plants, communications companies, and technicians in various specialities. Trying to imagine how all these are replaced by robots that are maintained by other robots, that are maintained by other robots ...

I feel that both this and EYs complex nanotechnology are far too fairy tale like.

Any competent virologist could make a vaccine resistant, contagious, highly lethal to humans virus. We know how to do it - this is the entire field of gain of function research. It doesn't need any global infrastructure - just a local lab, resources at the few million dollar level, and intention. An AGI could certainly do this. No new technology (beyond the AGI itself) required. I feel that if any scenario is going to convince the "no problem here" skeptics it would be that one. Especially since COVID is a highly contagious, new virus that by dumb luck is not all that lethal.

Any competent virologist could make a vaccine resistant, contagious, highly lethal to humans virus.

This is constantly repeated on here, and it's wrong.

Virologists can't do that. Not quickly, not confidently, and even less if they want it to be universally lethal.

Biology is messy and strange, unexpected things happen. You don't find out about those things until you test, and sometimes you don't find out until you test at scale. You cannot predict them with computer simulations, at least unless you have already converted the entire planet to computronium. You can't model everything that's going on with the virus in one host, let alone if you have to care about interactions with the rest of the world... which you do. And anything you do won't necessarily play out the same on repeated tries.

You can sometimes say "I expect that tweaking this amino acid will probably make the thing more infectious", and be right. You can't be sure you're right, nor know how much more infectious, unless you try it. And you can't make a whole suite of changes to get a whole suite of properties, all at the same time, with no intermediate steps.

You can throw in some manual tweaks, and also let it randomly mutate, and try to evolve it by hothouse methods... but that takes a lot of time and a significant number of hosts.

90 percent lethality is much harder than 50. 99 is much harder than 90.

The more of the population you wipe out, the less contact there is to spread your plague... which mean that 100 percent is basically impossible. Not to mention that if it's really lethal, people tend to resort to drastic measures like shutting down all travel. If you want an animal vector or something to get around that sort of thing, you've added another very difficult constraint.

Vaccine resistance, and even natural immunity resistance, tend to depend on mutations. The virus isn't going to feel any obligation to evolve in ways that are convenient for you, and your preferred strains can get outcompeted. In fact, too much lethality is actually bad for a virus in terms of reproductive fitness... which is really the only metric that matters.

[-]Hide-3-3

How is such a failure of imagination possible?

It's odd to claim that, contingent upon AGI being significantly smarter than us, and wanting to kill us, that there is no realistic pathway for us to be physically harmed. 

Claims of this sort by intelligent, competent people likely reveal that they are passively objecting to the contingencies rather than disputing whether these contingencies would lead to the conclusion.

The quotes you're responding to here superficially imply "if smart + malicious AI, it can't kill us", but it seems much more likely this is a warped translation of either "AI can't be smart", or "AI can't be malicious".

I imagine there could also be some unexplored assumption, such as "but we are many, and the AI is one" (a strong intuition that many always defeat one, which was true for our ancestors), and they don't realize that "one" superhuman AI could still do thousand things in parallel and build backups and extra robotic brains.