I upvoted your post because it seems relatively lucid and raises some important points, but would like to say that I'm in the middle of writing a pretty long, detailed explanation of why I agree with most of the gripes (e.g. AIs can't use magic to mine coal/build nanobots) and yet the object-level conclusions here are still untrue. In practice, I seriously doubt we would have more than a year to live after the release of AGI with the long term planning and reasoning abilities of most accountants, even without FOOM. People here shouldn't assume that, because Eliezer never posted a detailed analysis on LessWrong, everyone on the doomer train is starting from unreasonable premises regarding how robot building and research could function in practice.
+1. If you don't write that post, I will. :)
And if you want feedback on your draft I'd be happy to give it a read and leave comments.
Related reading, if you're interested -- I tried to make these same arguments a few months ago:
The beginning of this post seems fairly good.
I agree that an AGI would need lots of trial and error to develop a major new technology.
I'm unsure whether an AGI would need to be as slow as humans about that trial and error. If it needs secrecy, that might be a big constraint. If it gets human cooperation, I'd expect it to improve significantly on human R&D speed.
I also see a nontrivial chance that humans will develop Drexlerian nanotech before full AGI.
Your post gets stranger toward the end.
I don't see much value in a careful engineering analysis of how an AGI might kill us. Most likely it would involve a complex set of strategies, including plenty of manipulation, with no one attack producing certain victory by itself, but with humanity being overwhelmed by the number of different kinds of attack. There's enough uncertainty in that kind of fight that I don't expect to get a consensus on who would win. The uncertainty ought to be scary enough that we shouldn't need to prove who would win.
I initially wrote a long comment discussing the post, but I rewrote it as a list-based version that tries to more efficiently parcel up the different objections/agreements/cruxes.
This list ended up basically just as long, but I feel it is better structured than my original intended comment.
(Section 1): How fast can humans develop novel technologies
A crux here seems to be the question of how well the AGI can simulate physical systems. If it can simulate them perfectly, there's no need for real-world R&D. If its simulations are below some (high) threshold fidelity, it'll need actors in the physical world to conduct experiments for it, and that takes human time-scales.
A big point in favor of "sufficiently good simulation is possible" is that we know the relevant laws of physics for anything the AGI might need to take over the world. We do real-world experiments because we haven't managed to write simulation software that implements these laws at sufficiently high fidelity and for sufficiently complex systems, and because the compute cost of doing so is enormous. But in 20 years, an AGI running on a giant compute farm might both write more efficient simulation codes and have enough compute to power them.
You're making two separate assumptions here, both found in the field of computational complexity:
For (1), it might suffice to show that some physical system can be approximated by some algorithm, even if the true system is not known to be computable. Computability is a property of formal systems.
It is an open question if all real world physical processes are computable. Turing machines are described using natural numbers and discrete time steps. Real world phenomena rely on real numbers and continuous time. Arguments that all physical processes are computable are based on discretizing everything down to the Planck time, length, mass, temperature, and then assuming determinism.
This axiom tends to come as given if you already believe the "computable universe" hypothesis, "digital physics", or "Church-Turing-Deutsch" principle is true. In those hypotheses, the entire universe...
It seems like you're relying on the existence of exponentially hard problems to mean that taking over the world is going to BE an exponentially hard problem. But you don't need to solve every problem. You just need to take over the world.
Like, okay, the three body problem is 'incomputable' in the sense that it has chaotically sensitive dependence on initial conditions in many cases. So… don't rely on specific behavior in those cases on long time horizons without the ability to do small adjustments to keep things on track.
If the AI can detect most of the hard cases and avoid relying on them, and include robustness by having multiple alternate mechanisms and backup plans, even just 94% success on arbitrary problems could translate into better than that on an overall solution.
Hi all, I'm really sorry I've not yet been able to read the whole list of comments and replies, but I'd like to rise the point that usually an intelligence which is one order of magnitude or more than the existing ones can controll them at will. We humans are able to control dogs and make them kill each other (dog fights) beacuse we kind of understand the way they react to different stimulus. I don't see why the AGI would need to spend so much time preparing robots, it could just keep an army of humans the size it will and this army could perfectly well do anything it needs given that the AGI is far superior to us regarding intelligence. Also humans would probably never know that they're being commanded by an AGI, I don't feel it's too hard to convince a human to kill another human for a high porpouse. What I mean is that I think the whole point of analyzing the robots, etc is useless, what should be analyzed is how long would it take an AGI to make humans believe they're fighting for a higher porpouse (as in the cruzades for example) and have an army of humans do whatever it takes. Of course that's not hte end of humans, but at least it's the end of "free" humans (if that's something we are right now, which is also a matter of discussion...)
Sorry for my english, not my native tongue.
(minor corrections, sorry again)
This an interesting essay and seems compelling to me. Because I am insufferable, I will pick the world's smallest nit.
The Wright Brothers took 4 years to build their first successful prototype. It took another 23 years for the first mass manufactured airplane to appear, for a total of 27 years of R&D.
That's true but artisanal airplanes were produced in the hundreds of thousands before mass manufacture. 200k airplanes served in WW1 just 15 years in. So call it 15 years of R&D.
Very nice post. One comment I'd add is that I have always been under the assumption by the time AGI is here many of the things you say it would need time to create humans will have already achieved. I'm pretty sure we will have fully automated factories, autonomous military robots that are novel in close quarters, and near perfect physics simulations, etc by the time AGI is achieved.
Take the robots here for example. I think an AGI could potentially start making rapid advancements with the robots shown here: https://say-can.github.io/
15-20 years from now do you really think an AGI would need do much alteration to the top Chinese or American AI technologies?
I don't know, the bacteria example really gets me because working in biotech, it seems very possible and the main limitation is current lack of human understanding about all proteins' functions which is something we are actively researching if it can be solved via AI.
I imagine an AI roughly solving the protein function problem just as we have a rough solution for protein folding, then hacking a company which produces synthetic plasmids and slipping in some of its own designs in place of some existing orders. Then when those research labs receive their plas...
My response is 'the argument from the existence of new self made billionaires'.
There are giant holes in our collective understanding of the world, and giant opportunities. There are things that everyone misses until someone doesn't.
A much smarter than human beings thing is simply going to be able to see things that we don't notice. That is what it means for it to be smarter than us.
Given how high dimensional the universe is, it would be really weird in my view if none of the things that something way smarter than us can notice don't point to highly c...
For the record, this post made me update towards slightly longer timelines (as in, by a year or two)
The question it comes down to is: How long are the feedback cycles of an AGI and what is their bandwidth?
I asked a question in this direction but there wasn't an answer: Does non-access to outputs prevent recursive self-improvement?
"Either they’re perfectly doable by humans in the present, with no AGI help necessary."
So, your argument about why this is a relevant statement is that AI isn't adding danger? That seems to me to be using a really odd standard for "perfectly doable" .. the actual number of humans who could do those things is not huge, and humans don't usually want to.
Like either ending the world is easy for humans, in which AI is dangerous because it will want to, or its hard for humans in which case AI is dangerous because it will do them better.
I don't think that works to dismiss that category of risk.
Thanks for the writeup. I feel like there's been a lack of similar posts and we need to step it up.
Maybe the only way for AI Safety to work at all is only to analyze potential vectors of AGI attacks and try to counter them one way or the other. Seems like an alternative that doesn't contradict other AI Safety research as it requires, I think, entirely different set of skills.
I would like to see a more detailed post by "doomers" on how they perceive these vectors of attack and some healthy discussion about them.
It seems to me that AGI is not born Godl...
I agree that intelligence is limited without the right data, so the AI might need to engage in experiments to learn what it needs, but I imagine that a sufficiently smart system would be capable of thinking of innocent-seeming experiments, preferably ones that provide great benefits to humanity, that would allow it to acquire the data that it needs.
Self-improving will also require a lot of trial and errors, like training variants of NN and testing agents in simulation, if AI doesn’t have perfect theory of intelligence and if it is not P=NP difficult task.
Often when humans make a discovery through trial and error, they also find a way they could have figured it out without the experiments.
This is basically always the case in software engineering—any failure, from a routine failed unit test up to a major company outage, was obviously-in-restrospective avoidable by being smarter.
Humans are nonetheless incapable of developing large complex software systems without lots of trial and error.
I know less of physical engineering, so I ask non-rhetorically: does it not have the 'empirical results are foreseeable in retrospect' property?
It seems odd to suggest that the AI wouldn't kill us because it needs our supply chain. If I had the choice between "Be shut down because I'm misaligned" (or "Be reprogrammed to be aligned" if not corrigible) and "Have to reconstruct the economy from the remnants of human civilization," I think I'm more likely to achieve my goals by trying to reconstruct the economy.
So if your argument was meant to say "We'll have time to do alignment while the AI is still reliant on the human supply chain," then I don't think it works. A functional AGI would rather destro...
TL;DR: Hacking
Doesn't require trial and error in the sense you're talking about. Totally doable. We're good at it. Just takes time.
What good are humans without their (internet connected) electronics?
How harmless would an AGI be if it had access merely to our (internet connected) existing weapons systems, to send orders to troops, and to disrupt any supplies that rely on the internet?
What do you think?
EY published an article last week titled “AGI Ruin: A List of Lethalities”, which explains in detail why you can’t train an AGI that won’t try to kill you at the first chance it gets, as well as why this AGI will eventually appear given humanity’s current trajectory in computer science. EY doesn’t explicitly state a timeline over which AGI is supposed to destroy humanity, but it’s implied that this will happen rapidly and humanity won’t have enough time to stop it. EY doesn’t find the question of how exactly AGI will destroy humanity too interesting and explains it as follows:
Let’s break down EY’s proposed plan for “Skynet” into the requisite engineering steps:
The plan above looks great for a fiction book and EY is indeed a great fiction writer in addition to his Alignment work, but there’s one unstated assumption: the AGI will not only be able to design everything using whatever human data it has available, but it will also execute the evil plan without needing lots of trial and error like mortal human inventors do. And surprisingly this part of EY’s argument gets little objection. A visual representation of my understanding of EY’s mental model of AGI vs. Progress is as follows:
How fast can humans develop novel technologies?
Humans are the only known AGI that we have available for reference, so we could look at the fastest known examples of novel engineering to see how fast an AGI might develop something spectacular and human-destroying. Patrick Collison of Stripe keeps a helpful page titled “Fast” with notable “examples of people quickly accomplishing ambitious things together”. The engineering entries include:
Sounds very quick? Definitely, but the problem is that Patrick’s examples are all for engineering constructs building on top of decades of previous work. Designing a slightly better airplane in 1944 is not the same as creating the very first airplane in 1903, as by 1944 humans had 30 years of experience to build on top of. And if your task is to build diamondoid bacteria manufactured by a protein-based nanomachinery factory you’re definitely in Wright Brothers territory. So let’s instead look at timelines of novel technologies that had little prior research and infrastructure to fall back on:
Now… you might object to this by correctly calling out the downside of human R&D:
And this is all true! Humans are nothing to a hypothetical team of AGIs. But the problem is… until AGI can build its fantastical diamondoid bacteria, it remains dependent on imperfect human hands to conduct its R&D in the real world, as they’ll be the only way for AGI to interact with the physical world for a very long time. Remember that AGI’s one downside is that it will be running on motionless computers, unlike humans who have been running around with 4 limbs since the beginning of civilization. Which in turn brings us to the 30+ years timeline of developing a novel engineering construct, no matter how smart the AI will be.
Unstoppable intellect meets complexity of the universe
Plenty of content has been written about how human scientific progress is slowing down, my favorite being WTF Happened in 1971 and Scott’s 2018 post Is Science Slowing Down?. In the second article Scott brings up the paper Are Ideas Getting Harder to Find? by Bloom, Jones, Reenen & Webb (2018), which has the following neat graph:
We can see how the amount of investment into R&D is growing every year, but productive research is more or less flat. The paper brings up a relatable example in the section on semiconductor research:
Not even AGI could get around this problem and would likely require an exponentially growing amount of resources as it delves deeper into engineering and fundamental research. It is definitely true that AGI itself will be rapidly increasing its intellect, but can this really continue indefinitely? At some point all the low hanging fruit missed by human AI researchers will be exhausted and AGI will have to spend years in real world time to make significant improvements of its own IQ. Granted, AGI will rapidly reach an IQ far beyond human reach, but all this intellectual power will still have to contend with the difficulties of novel research.
What does AGI want?
Since AGI development is completely decoupled from mammalian evolution here on Earth, its quite likely to eventually exhibit “blue and orange” morality, behaving in a completely alien and unpredictable fashion, with no humanly understandable motivations or a way for humans to relate to what the AGI wants. That being said, AGI is likely to fall into one of two buckets regardless of its motivations:
Let’s start with scenario #1 by looking at… the common pencil.
What does it take to make a pencil?
A classic pamphlet called I, Pencil walks us through what it takes to make a common pencil from scratch:
The point to this entire story is that making something as simple as a pencil requires a massive supply chain employing tens of millions of non-AGI humans. If you want any hope of continuing to exist, you need to replace the labor of this gigantic global army of humans with AGI-controlled robots or “diamondoid bacteria” or whatever other magical contraption you want to invoke. Which will require lots of trial & error and decades of building out a reliable AGI-controlled supply chain that could be reused to fight humans at the drop of a hat. Because otherwise AGI will risk seeing its brilliant plan fail, resulting in humans going berserk against any machines capable of running said AGI and ending its reign of Earth long before it has a chance to start in earnest. And if the AGI doesn’t understand this… how smart is it really?
YOLO AGI?
But what if the AGI is absolutely ruthless and doesn’t care if it goes up in flames as soon as humans are gone? Then we could get to the end of humanity much faster with options like:
The problem with all these scenarios is similar:
Either they’re perfectly doable by humans in the present, with no AGI help necessary. I.e. we’ve been barely saved from WW3 by a Soviet officer, long before AGI was on anyone’s mind. So at worst AGI will somewhat increase the risks of this happening in the short term... Or they require lots of trial & error to develop into functional production-ready technologies, once again creating a big problem for AGI, as it has to rely on imperfect humans to do the novel R&D. This will still take decades, even if AGI won’t worry about a full takeover of supply chains.
But what about AlphaFold?
Another possible counter-argument is that AGI will figure out the laws of the universe through internal modeling and will be able to simulate and perfect its amazing inventions without needing trial & error in the physical world. EY mentions AlphaFold as an example of such a breakthrough. If you haven’t heard about it, here’s a description of the Protein Folding Problem from Wiki that AlphaFold 2 solved better than any other prior system back in 2020:
According to EY, the existence of AlphaFold shows that a smart enough AGI could eventually learn to manipulate proteins into “nanofactories” that could be used to interact with the physical world. However the current version still has major limitations:
In other words, there’s still a huge leap between “can predict simple protein structures” and “can design protein nanofactories without experimentation”. AGI will likely need to spend decades managing laboratory experiments to fill in the gaps around our understanding of how proteins work. And don’t forget that currently available commercial protein printers are not perfect, especially if you’re trying to print a novel structure of far bigger complexity than anything else on the planet. Also see this excellent comment on the subject by anonymousaisafety.
What if AGI settles for a robot army?
Cybernetic army from I, Robot
We could also think of the diamondoid bacteria as just an example of what the AI can do and turn to other ways it could manipulate the physical reality, that are closer to the technology that we already have today. There’s impressive videos of Boston Dynamics robots doing all kinds of stunts, so we could ask if perhaps AGI could utilize their existing progress to quickly give itself a way to interact with the outside world. However this would still involve many roadblocks:
[added] Also see this excellent comment by anonymousaisafety explaining why "just takeover the human factories" is not a quick path to success (slightly edited below):
My prediction is that it will take AGI at least 30 years of effort to get to a point where it can comfortably rely on the robots to interact with the physical world and not have to count on humans for its supply chain needs.
[added] What if AGI just simulates our physical world?
This idea goes hand-in-hand with idea that AlphaFold is the answer to all challenges in bioengineering. There are two separate assumptions here, both found in the field of computational complexity:
I don't think these assumptions are reasonable. For a full explanation see this excellent comment by anonymousaisafety.
Mere mortals can’t comprehend AGI?
Another argument is that AGI will achieve such an incomprehensible level of intellect that it will become impossible to predict what it will be capable of. I mean, who knows, maybe with an IQ of 500 you could just magically turn yourself into a God and destroy Earth with a Thanos-style snap of your fingers? But I contend that even a creature with an IQ of 500 will be inherently limited by our physical universe and won’t magically become gain omniscience by virtue of its intellect alone. It will instead have to spend decades to get rid of using humans as a proxy, no matter how smart it could be potentially.
Does this mean EY is wrong and AGI is not a threat?
I believe that EY is only wrong about handwaving the difficulties of growing from a computer-based AGI to an AGI capable of operating independently from the human race. In the long-term his predictions will likely come true, once AGI has enough time to go through the difficult R&D cycle of building the nanofactories and diamondoid bacteria. My predicted timeline is as follows:
Updated version of the original progress graph
I’m hoping that the AI Alignment movement tries to spend more time on the low level engineering details of “humanity goes poof” rather than handwaving everything away via science fiction concepts. Because otherwise it’s hard to believe that the FOOM scenario could ever come to fruition. And if FOOM is not the real problem, perhaps we could save humanity by managing AGI’s interactions with the physical world more carefully once it appears?