Summary: Yudkowsky argues that an unaligned AI will figure out a way to create self-replicating nanobots, and merely having internet access is enough to bring them into existence. Because of this, it can very quickly replace all human dependencies for its existence and expansion, and thus pursue an unaligned goal, e.g. making paperclips, which will most likely end up in the extinction of humanity.
I however will write below why I think this description massively underestimates the difficulty in creating self-replicating nanobots (even assuming that they are physically possible), which requires focused research in the physical domain, and is not possible without involvement of top-tier human-run labs today.
Why it matters? Some of the assumptions of pessimistic AI alignment researchers, especially by Yudkowsky, rest fundamentally on the fact that the AI will find quick ways to replace humans required for the AI to exist and expand.
- We have to get AI alignment right the first time we build a Super-AI, and there are no ways to make any corrections after we've built it
- As long as the AI does not have a way to replace humans outright, even if its ultimate goal may be non-aligned, it can pursue proximate goals that are aligned and safe for it to do. Alignment research can continue and can attempt to make the AI fully aligned or shut it down before it can create nanobots.
- The first time we build a Super-AI, we don't just have to make sure it's aligned, but we need it to perform a pivotal act like create nanobots to destroy all GPUs
- I argue below that this framing may be bad because it means performing one of the most dangerous steps first — creating nanobots — which may be best performed by an AI that is much more aligned than a first attempt
What this post is not about: I make no argument about the feasibility of (non-biological) self-replicating nanobots. There may be fundamental reasons why they are impossible/difficult (even for superintelligent AIs)/will not outcompete biological life, an interesting question that is explored more by bhaut. I also don't claim that AI alignment doesn't matter. I think that it's extremely important, but I think it's unlikely that (1) a one-shot process will lead to it and also (2) that one-shot is necessary; I actually think that this kind of thinking increases risk.
Finally, I don't claim that there aren't easier ways to kill all, or almost all, humans, for example pandemics or causing nuclear wars. However, most of these scenarios do not leave any good paths for an AI to expand because there would be no way to get more of its substrate (e.g. GPUs) or power supplies.
Core argument
Building something like a completely new type of nanobot is a massive research undertaking. Even under the assumption that an AI is much more intelligent and can learn and infer from much less data, it cannot do so from no data.
Building a new type of nanobot (not based on biological life) requires not just the ability to design from existing capabilities, but actually doing completely new experiments on how the nanomachinery that is going to be used to do this interacts with itself and the external world. It isn't possible to completely cut out all experiments from the design process, because at least some of the experiments will be about how the physical world works. If you don't know anything about physics, you clearly can't design any kind of machine; I am pretty certain that right now we do not know enough about nanomachines to design a new kind of non-biological self-replicating nanobot that immediately works out of the box.
To build it, you would need high quality labs to do very well specified experiments, build prototypes in later stages and report detailed information on how they failed, until you could arrive at a first sample of a self-replicating nanobot, at which point the AGI might be in a position to replace all humans.
Counterargument 1: We can build some complex machines from blueprints, and they work the first time. As an example, we can certainly design a complex electronics product, manufacture the PCB and add all the chips and other parts. If an experienced engineer does this, there is a good chance it will work the first time. However, new nanomachines would be different, because they cannot be assembled from parts that are already extremely well studied in isolation. We make chips such that when they are used in their specified way, their behaviour is extremely predictable, but no such components currently exist in the world of nanomachines.
Counterargument 2: The AI can simply simulate everything instead of performing any physical experiments. All of the required laws of physics are known: The standard model describes the microscopic world extremely well, and (microscopic) gravity is irrelevant for constructing nanobot, so no (currently unknown) physical theory unifying all laws would be required. While it is indeed possible or even likely that the standard model theoretically describes all details of a working nanobot with the required precision, the problem is that in practice it is impossible to simulate large physical systems using it. Many complex physical systems are still largely modelled empirically (ad-hoc models validated using experiments) rather than it being possible to derive them from first principles. While physicists sometimes claim to derive things from first principles, in practice these derivations often ignore a lot of details which still has to be justified using experiments. An AI can also make progress on better simulation, but simulating complex nanomachines outright is exceedingly unlikely.
Counterargument 3: Nanobots already exist, the AI will just use existing biology. Existing biology is indeed good for making self-replicating nanobots, but at least two difficult problems will remain: To make any kind of effective use of the network of nanobots, it will require creating a communication network using cells that allows them to come together to execute some more complex software to at least connect to the internet (and thus back to the AI). That's still a monumental task to achieve using biological systems and would still require a lot of research.
Counterargument 4: The AI can do the experiments in secret, or hide the true nature of the experiments in things that seem aligned. This could certainly be relevant in the long run, especially if we want the AI to solve complex problems. But on shorter timescales, most of what the AI would need to learn is going to be extremely specific to nanomachines. You do not get data about this by making completely unrelated experiments that do not involve nanotechnology.
Significance for AI alignment
I don't claim that this means we don't need alignment, or that an AI won't eventually be able to build nanobots (if it is feasible at all, of course) — just that it seems highly possible to delay this step by years, if it is the intention of the operator to do so (and it has some minimal cooperation on this from the rest of the world).
This means that it is possible to study AIs with capabilities potentially far exceeding human capabilities. Alignment is likely an iterative process and no one-shot solution exists, but that's probably ok, because well-enough aligned AIs can coexist with humans, be studied, and be improved for the next iteration, without immediately seeing human bodies only as bags of atoms to be harvested to do other things.
Pivotal acts
I think pivotal acts may be a bad idea in general. The arguments for this have been spelled out before, for example by Andrew_Critch. However, even if one believes (a) in the feasibility of nanobots and (b) pivotal acts are necessary, then using nanobots to carry out a pivotal action might be a really bad idea.
If someone decides that a pivotal act should be carried out using nanobots (either on their own or by this being the suggested best option by an AI), they might be inclined to do anything to perform any physical acts necessary for the AI to achieve this, making the AI much more dangerous if it is not perfectly aligned (which in itself may be an impossible problem). Pivotal acts that do not require giving an AI full human-equivalent or better physical capabilities would be much safer (probably still a bad idea).
How could this argument change in the future
I think my argument that building nanobots without massive help from first-tier human labs is true now and for at least several more years. However, over several decades, some things might change substantially, for example:
1. Production processes could be much more automated than they are now. If factories exist that can make new, complex machines without major retooling, they could make it much simpler for an AI to perform completely new tasks in the physical world with minimal human interaction
2. Robotics can advance. Humanoid robots that can peform many physical human tasks may make it possible for the AI to build completely human-independent labs.
3. More research into building nanomachines that eliminates more of the unknowns.
4. More biotech research could also allow more control of the physical world, for example if cell networks can be built to perform some tasks.
5. It is maybe possible that quantum computers are powerful enough to simulate much more complex physical processes than is possible on classical computers, and thus an AI with access to a quantum computer may be able to massively reduce the number of experiments necessary to construct nanobots. (Feels unlikely to me but cannot a priori be excluded)
So whether an AI can achieve nanobots just via internet access will potentially have to be re-evaluated in the future when one or more of these are developed.
What this shouldn't be taken as
I am not arguing alignment is not important, in fact I think it is very important.
1. Regardless of the feasibility of nanobots, I think there are probably vastly easier ways to kill all humans, however they would leave an AI without a practical way to continue existing or expanding.
2. It is also possible that many scenarios exist where an AI does (1) by accident.
3. AIs don't need nanobots to take control of humans and human institutions. There are many other ways that involve using humans against each other and are probably exploitable by much less powerful AIs. (Crucially, however, they do depend on some humans and might require different tools to control AGI risk.)
4. I don't think that this makes the AI alignment trivial to solve, or claim that this gives a recipe to solve it. I just think that it may be fruitful to look into research that starts from moderately aligned AIs and figures out how to get them more aligned rather than having to perform a very risky one-shot experiment.
Likewise :)
Also, sorry about the length of this reply. As the adage goes: "If I had more time, I would have written a shorter letter."
That seems to be one of the relevant differences between us. Although I don't think it is the only difference that causes us to see things differently.
Other differences (I guess some of these overlap):
From my perspective, I don't see how your reasoning is qualitatively distinct from saying in the 1500s: "We will for sure never be able to know what the sun is made out of, since we won't be able to travel there and take samples."
Even if we didn't have e.g. the standard model, my perspective would still be roughly what it is (with some adjustments to credences, but not qualitatively so). So to me, us having the standard model is "icing on the cake".
Eliezer says "A Bayesian superintelligence, hooked up to a webcam, would invent General Relativity as a hypothesis (...)". I might add more qualifiers (replacing "would" with "might", etc). I think I have wider error-bars than Eliezer, but similar intuitions when it comes to this kind of thing.
Speaking of intuitions, one question that maybe gets at deeper intuitions is "could AGIs find out how to play theoretically perfect chess / solve the game of chess?". At 5/1 odds, this is a claim that I myself would bet neither for nor against (I wouldn't bet large sums at 1/1 odds either). While I think people of a certain mindset will think "that is computationally intractable [when using the crude methods I have in mind]", and leave it at that.
As to my credences that a superintelligence could "oneshot" nanobots[1] - without being able to design and run experiments prior to designing this plan - I would bet neither "yes" or "no" to that a 1/1 odds (but if I had to bet, I would bet "yes").
But it would have other information. Insofar as it can reason about the reasoning-process that it itself consists of, that's a source of information (some ways by which the universe could work would be more/less likely to produce itself). And among ways that reality might work - which the AI might hypothesize about (in the absence of data) - some will be more likely than others in a "Kolmogorov complexity" sort of way.
How far/short a superintelligence could get with this sort of reasoning, I dunno.
Here is an excerpt from a TED-talk from the Wolfram Alpha that feels a bit relevant (I find the sort of methodology that he outlines deeply intuitive):
"Well, so, that leads to kind of an ultimate question: Could it be that someplace out there in the computational universe we might find our physical universe? Perhaps there's even some quite simple rule, some simple program for our universe. Well, the history of physics would have us believe that the rule for the universe must be pretty complicated. But in the computational universe, we've now seen how rules that are incredibly simple can produce incredibly rich and complex behavior. So could that be what's going on with our whole universe? If the rules for the universe are simple, it's kind of inevitable that they have to be very abstract and very low level; operating, for example, far below the level of space or time, which makes it hard to represent things. But in at least a large class of cases, one can think of the universe as being like some kind of network, which, when it gets big enough, behaves like continuous space in much the same way as having lots of molecules can behave like a continuous fluid. Well, then the universe has to evolve by applying little rules that progressively update this network. And each possible rule, in a sense, corresponds to a candidate universe.
Actually, I haven't shown these before, but here are a few of the candidate universes that I've looked at. Some of these are hopeless universes, completely sterile, with other kinds of pathologies like no notion of space, no notion of time, no matter, other problems like that. But the exciting thing that I've found in the last few years is that you actually don't have to go very far in the computational universe before you start finding candidate universes that aren't obviously not our universe. Here's the problem: Any serious candidate for our universe is inevitably full of computational irreducibility. Which means that it is irreducibly difficult to find out how it will really behave, and whether it matches our physical universe. A few years ago, I was pretty excited to discover that there are candidate universes with incredibly simple rules that successfully reproduce special relativity, and even general relativity and gravitation, and at least give hints of quantum mechanics."
As I understand it, the original experiment humans did to test for general relativity (not to figure out that general relativity probably was correct, mind you, but to test it "officially") was to measure gravitational redshift.
And I guess redshift is an example of something that will affect many photos. And a superintelligent mind might be able to use such data better than us (we, having "pathetic" mental abilities, will have a much greater need to construct experiments where we only test one hypothesis at a time, and to gather the Bayesian evidence we need relating to that hypothesis from one or a few experiments).
It seems that any photo that contains lighting stemming from the sun (even if the picture itself doesn't include the sun) can be a source of Bayesian evidence relating to general relativity:

It seems that GPS data must account for redshift in its timing system. This could maybe mean that some internet logs (where info can be surmised about how long it takes to send messages via satellite) could be another potential source for Bayesian evidence:
I don't know exactly what and how much data a superintelligence would need to surmise general relativity (if any!). How much/little evidence it could gather from a single picture of an apple I dunno.
I disagree with this.
First off, it makes sense to consider theories that explain more observations than just the ones you've encountered.
Secondly, simpler versions of physics do not explain your observations when you see 2 webcam-frames of a falling apple. In particular, the colors you see will be affected by non-Newtonian physics.
Also, the existence of apples and digital cameras also relates to which theories of physics are likely/plausible. Same goes for the resolution of the video, etc, etc.
You say that so definitively. Almost as if you aren't really imagining an entity that is orders of magnitude more capable/intelligent than humans. Or as if you have ruled out large swathes of the possibility-space that I would not rule out.
If an AGI is superintelligent and malicious, then surviving/expanding (if it gets onto the internet) seems quite clearly feasible to me.
We even have a hard time getting corona-viruses back in the box! That's a fairly different sort of thing, but it does show how feeble we are. Another example is illegal images/videos, etc (where the people sharing those are humans).
An AGI could plant itself onto lots of different computers, and there are lots of different humans it could try to manipulate (a low success rate would not necessarily be prohibitive). Many humans fall for pretty simple scams, and AGIs would be able to pull off much more impressive scams.
Here you speak about how humans work - and in such an absolutist way. Being feeble and error-prone reasoners, it makes sense that we need to rely heavily on experiments (and have a hard time making effective use of data not directly related to the thing we're interested in).
I think protein being "solved" exemplifies my perspective, but I agree about it not "proving" or "disproving" that much.
When it comes to predictable properties, I think there are other molecules where this is more the case than for biological ones (DNA-stuff needs to be "messy" in order for mutations that make evolution work to occur). I'm no chemist, but this is my rough impression.
Ok, so you acknowledge that there are molecules with very predictable properties.
It's ok for much/most stuff not to be predictable to an AGI, as long as the subset of stuff that can be predicted is sufficient for the AGI to make powerful plans/designs.
Even IF that is the case (an assumption that I don't share but also don't rule out), design-plans may be made to have experimentation built into them. It wouldn't necessarily need to be like this:
I could give specific examples of ways to avoid having to do it that way, but any example I gave would be impoverished, and understate the true space of possible approaches.
I read the scenario he described as:
I interpreted him as pointing to a larger possibility-space than the one you present. I don't think the more specific scenario you describe would appear prominently in his mind, and not mine either (you talk about getting "some scientists in a lab to mix it together" - while I don't think this would need to happen in a lab).
Here is an excerpt from here (written in 2008), with boldening of text done by me:
"1. Crack the protein folding problem, to the extent of being able to generate DNA
strings whose folded peptide sequences fill specific functional roles in a complex
chemical interaction.
2. Email sets of DNA strings to one or more online laboratories which offer DNA
synthesis, peptide sequencing, and FedEx delivery. (Many labs currently offer this
service, and some boast of 72-hour turnaround times.)
3. Find at least one human connected to the Internet who can be paid, blackmailed,
or fooled by the right background story, into receiving FedExed vials and mixing
them in a specified environment.
4. The synthesized proteins form a very primitive “wet” nanosystem which, ribosomelike, is capable of accepting external instructions; perhaps patterned acoustic vibrations delivered by a speaker attached to the beaker.
5. Use the extremely primitive nanosystem to build more sophisticated systems, which
construct still more sophisticated systems, bootstrapping to molecular
nanotechnology—or beyond."
Btw, here are excerpts from a TED-talk by Dan Gibson from 2018:
"Naturally, with this in mind, we started to build a biological teleporter. We call it the DBC. That's short for digital-to-biological converter. Unlike the BioXp, which starts from pre-manufactured short pieces of DNA, the DBC starts from digitized DNA code and converts that DNA code into biological entities, such as DNA, RNA, proteins or even viruses. You can think of the BioXp as a DVD player, requiring a physical DVD to be inserted, whereas the DBC is Netflix. To build the DBC, my team of scientists worked with software and instrumentation engineers to collapse multiple laboratory workflows, all in a single box. This included software algorithms to predict what DNA to build, chemistry to link the G, A, T and C building blocks of DNA into short pieces, Gibson Assembly to stitch together those short pieces into much longer ones, and biology to convert the DNA into other biological entities, such as proteins.
This is the prototype. Although it wasn't pretty, it was effective. It made therapeutic drugs and vaccines. And laboratory workflows that once took weeks or months could now be carried out in just one to two days. And that's all without any human intervention and simply activated by the receipt of an email which could be sent from anywhere in the world. We like to compare the DBC to fax machines.
(...)
Here's what our DBC looks like today. We imagine the DBC evolving in similar ways as fax machines have. We're working to reduce the size of the instrument, and we're working to make the underlying technology more reliable, cheaper, faster and more accurate.
(...)
The DBC will be useful for the distributed manufacturing of medicine starting from DNA. Every hospital in the world could use a DBC for printing personalized medicines for a patient at their bedside. I can even imagine a day when it's routine for people to have a DBC to connect to their home computer or smart phone as a means to download their prescriptions, such as insulin or antibody therapies. The DBC will also be valuable when placed in strategic areas around the world, for rapid response to disease outbreaks. For example, the CDC in Atlanta, Georgia could send flu vaccine instructions to a DBC on the other side of the world, where the flu vaccine is manufactured right on the front lines."
I'm no expert on this, but what you say here seems in line with my own vague impression of things. As you maybe noticed, I also put "solved" in quotation marks.
As touched upon earlier, I am myself am optimistic when it comes to iterative plans for alignment. But I would prefer such iteration to be done with caution that errs on the side of paranoia (rather than being "not paranoid enough").
It would be ok if (many of the) people doing this iteration would think it unlikely that intuitions like Eliezer's or mine are correct. But it would be preferable for them to carry out plans that would be likely to have positive results even if they are wrong about that.
Like, you expect that since something seems hopeless to you, a superintelligent AGI would be unable to do it? Ok, fine. But let's try to minimize the amount of assumptions like that which are loadbearing in our alignment strategies. Especially for assumptions where smart people who have thought about the question extensively disagree strongly.
As a sidenote:
It's interesting to note that even though many people (such as yourself) have a "conservative" way of thinking (about things such as this) compared to me, I am still myself "conservative" in the sense that there are several things that have happened that would have seemed too "out there" to appear realistic to me.
Another sidenote:
One question we might ask ourselves is: "how many rules by which the universe could work would be consistent with e.g. the data we see on the internet?". And by rules here, I don't mean rules that can be derived from other rules (like e.g. the weight of a helium atom), but the parameters that most fundamentally determine how the universe works. If we...
...my (possibly wrong) guess is that there would be a "clear winner".
Even if my guess is correct, that leaves the question of whether finding/determining the "winner" is computationally tractable. With crude/naive search-techniques it isn't tractable, but we don't know the specifics of the techniques that a superintelligence might use - it could maybe develop very efficient methods for ruling out large swathes of search-space.
And a third sidenote (the last one, I promise):
Speculating about this feels sort of analogous to reasoning about a powerful chess engine (although there are also many disanalogies). I know that I can beat an arbitrarily powerful chess engine if I start from a sufficiently advantageous position. But I find it hard to predict where that "line" is (looking at a specific board position, and guessing if an optimal chess-player could beat me). Like, for some board positions the answer will be a clear "yes" or a clear "no", but for other board-positions, it will not be clear.
I don't know how much info and compute a superintelligence would need to make nanotechnology-designs that work in a "one short"-ish sort of way. I'm fairly confident that the amount of computational resources used for the initial moon-landing would be far too little (I'm picking an extreme example here, since I want plenty of margin for error). But I don't know where the "line" is.
Although keep in mind that "oneshotting" does not exclude being able to run experiments (nor does it rule out fairly extensive experimentation). As I touched upon earlier, it may be possible for a plan to have experimentation built into itself. Needing to do experimentation ≠ needing access to a lab and lots of serial time.