This makes assumptions that make no sense to me. Auto-GPT is already not passively safe, and there is no reason to be sure LLMs would remain myopic as they are scaled. LLMs are inscrutable matrixes of floating points that we are barely learning how to understand and interpret. We have no reliable way to predict when LLMs might hallucinate or misbehave in some other way. There is also no "human level" - LLMs are way faster than humans and are way more scalable than humans - there is no way to get LLMs that are as good as humans without having something that's way better than humans along a huge number of dimensions.
I love it when comments include arguments I have already raised in my "Some obvious objections to this argument" section.
Auto-GPT is already not passively safe, and there is no reason to be sure LLMs would remain myopic as they are scaled.
I agree with you that AutoGPT is not passively safe/myopic. However as I pointed out AI agents "only optionally mitigate myopia and passive safety.". If myopia and passive safety are critical safety guarantees it's easy to include them in AI Agents.
LLMs are inscrutable matrixes of floating points that we are barely learning how to understand and interpret.
This simply isn't true. I would encourage you to keep up to date with the latest research on AI interoperability. LLMs are highly interpretable. Not only can we understand their world models we can also detect whether or not they believe a statement to be true or whether or not they are lying.
More importantly, LLMs are much easier to interpret than biological systems (the product of evolution). The argument here is that we should scale up (relatively) easy-to-interpret LLMs now before the arrival of evolution-based AIs.
There is also no "human level" - LLMs are way faster than humans and are way more scalable than humans - there is no way to get LLMs that are as good as humans without having something that's way better than humans along a huge number of dimensions.
I'm not sure what point you're trying to make here. The question of importance isn't whether Deep-Learning models will ever be exactly human-level. The question is whether we can use them to safely augment human intelligence in order to solve the Alignment Problem.
I agree that LLMs are super-human on some dimensions (fact recall) and inferior to humans on others (ability to play connect 4) and therefore if an LLM (or AI-agent) was at-least human-level on all dimensions, it would naturally be super-human on at least some of them. This fact alone doesn't tell us whether or not LLMs are safe to use.
I think that we have very strong reasons to believe that a GPT-N style architecture would be highly safe and more-importantly that it would be far safter and more interpretable than an equally-powerful AI modeled after the human brain, or chosen randomly by evolution.
This solves nothing that could not be better solved by freezing development of hardware, which would also slow down evolutionary setups.
This also allows for more time for safer approaches such as genetic engineering and biological advancements to catch up, and keep us from Killing Everyone.
If your argument is that a race of genetically engineered super-humans are less likely to cause human extinction than GPT-5, the Neanderthals would like to have a word with you.
But notably, we have not killed all biological life and we are substantially Neanderthal. Versus death by AI, its a far better prospect.
Manifold currently estimates that there is a 4% chance GPT-5 will destroy the world. What percent chance do you estimate there is that a genetically engineered race of super-humans will cause human-extinction?
TLDR: This is really just a longer version of this comment.
A metaphor
You are Rocket McRocket-Face, the CEO of Rockets Inc, the world’s largest and most reputable rocket company. Rockets Inc isn’t the only rocket company in the world, but it is by far the biggest, richest, and most powerful rocket company on Earth. No other rocket company holds a candle to Rocket Inc. Nor are they likely to in the next 3-5 years.
Rocket McRocket-Face, CEO of Rockets Inc.
As the CEO of Rockets Inc, you dream of one day reaching the moon. Your reasons for dreaming are twofold. The first reason is that reaching the moon is a dream that men have had since ancient times. It is an achievement truly worthy to behold. The second reason is a bit more practical. It is widely agreed in the world of rocket science that the first person to reach the moon will hold a commanding advantage. From the heights of the moon the first person to reach it will be virtually unconquerable, able to hurl moon rocks down to punish any of their enemies.
Picture of Rocket McRocket-Face looking at a map of the moon
One day, two scientists come to you with research proposals for new types of rockets.
The first scientist is Mathy McEngineer. Mathy is one of your best engineers. He is well known for the reliability of his rockets. His design is comforting, simple-to-understand, and a natural extension of currently known rocket technologies. The design is so simple it can be explained to anyone with a degree in rocket-engineering in a few minutes. It involves taking the current well-known and trustworthy rocket designs and adding a few more parts: more engines, more fuel. Nothing out of the ordinary.
Mathy’s plan may not be brilliant, but it’s trustworthy and safe. And there’s a good chance that it will reach the moon (although it’s unlikely to reach even the nearest star).
Mathy McEngineer
After your meeting with Mathy, you are feeling good about your chances of winning the race to the moon and decide to leave work early that day. On your way out of work, you are suddenly stopped by Subtle McGenius. Immediately the good feeling you had vanishes as you feel a ball of anxiety in the pit of your stomach. You hate meeting with Subtle McGenius. He doesn’t even work for Rockets Inc (you fired him years ago), but somehow he always manages to show back up at the worst possible time. Times like right now.
Subtle McGenius
“You’re going to go with Mathy’s plan. I just know you are. Aren’t you?” raves Subtle.
“Get out of my hallway!” You murmur. “Guards!”
“Mathy’s plan may seem good, but it will only take you to the moon. You will never reach the stars!”
Thankfully, the guards arrive in moments (you pay them well). As they drag Subtle away kicking and screaming, he shouts out one last retort.
“You can’t stop me!” Subtle shouts. “One day I’ll build my rocket. And then… then you’ll learn!”
Subtle Being Dragged way by the guards
There was no need point in listening to Subtle, however. You’ve heard his plans for a new “evolutionary” rocket design a million times. Like his name, his plans are subtle, crafty, and impossible to understand. Even when Subtle’s plans work, nobody knows why they work. You’ve invested billions of dollars trying to understand Subtle’s plans and failed. Even if Subtle’s plans work, there’s no way to make them safe enough to risk a human life on.
As you finish your ninth hole of golf, you feel good about yourself. Soon, with Mathy’s rocket, you will be golfing on the moon. Even if it will never take you to the stars.
Rocket McRocket-Face golfing on the moon
The metaphor explained
Rockets in this metaphor represent AI research.
CEO Rocket McRocket-Face represents the audience. But the real-world person most like him is probably Sam Altman, CEO of OpenAI.
The moon represents human-level artificial intelligence. It is widely agreed that the first person or company to build an AI with human intelligence will gain a commanding lead over the rest of the world.
Mathy, and his rocket, represent Deep-Learning. Adding rockets and fuel represent scaling (adding more data and compute).
Subtle, and his rocket, represent evolution. While the designs produced by evolution are brilliant, even the simplest products of evolution are too complex for humans to understand.
Why this is a (notKillEveryoneIsm) argument for accelerating Deep Learning Research
The Deep-Learning paradigm is about as good-as-it-gets from an AI Safety perspective. Deep learning models are extremely logically simple. They are easy to interpret. They are extremely malleable to human control. And they are inherently myopic, which means they do not threaten to replace humanity. Finally, because they rely on huge data-centers full of GPUs, deep learning models are easy to shut-down.
By contrast, we know that evolution is capable of building much more intelligent and dangerous systems. Not only did evolution already produce human beings, but there is no inherent limitation on the types of algorithms that evolution can produce. If it is possible to develop dangerous super-intelligent AI, eventually evolution will find a way.
While evolution has an advantage in the long-term, Deep-learning currently holds the lead. I would argue that (if our only goal is to prevent the extinction of the human race), we should attempt to extend this lead as much as possible.
The risk (of not accelerating Deep Learning) is that as compute-power grows, it will eventually be possible for someone (Subtle in our story) to run a simulation of evolution on their computer that invents a new, more dangerous AI architecture. The algorithms produced by evolution are unlikely to be as easy to understand as current Deep-Learning models. Nor are they likely to be friendly towards human beings.
The benefit (of accelerating Deep Learning research) is that by increasing the intelligence available for humans to command, we have a better chance of solving the problem of friendly AI before someone develops a more powerful (and deadly) alternative paradigm.
Some obvious objections to this argument
That would mean Deep Learning models are inherently safe. This would only make the case for accelerating Deep Learning even stronger.
None of these are existential threats to the survival of the human race, so they are not objections to this argument.
Algorithmic improvements in Deep Learning are orthogonal to hardware advances relevant to evolutionary algorithms. If you plan to limit future increases in hardware performance that’s fine. But it’s irrelevant to the question of whether we should train the best Transformer we possibly can.
If your opinion is that increasing the total productive capacity of the human race is bad, history is not on your side.
There are five reasons to be optimistic about the LLM control problem: interpret-ability, passive safety, myopia, ease of feedback, and shutdown-ability. LLM Agents (if they work) mitigate 2 of these: myopia and passive safety. The other 3 are still sufficient. And more importantly much better than the safety guarantees we can expect for evolutionary algorithms (none).
Moreover, AI Agents only optionally mitigate myopia and passive safety. It is possible to build an AI agent that is passively safe (by requiring that it ask for permission) and myopic (by requiring it to consider only effects that are bounded in space and time).
Haste can be cautious. Moving slowly can be dangerous.
You’re right. But it puts us in a much better position to be able to solve that problem.
Conclusion
I believe that if you are only concerned about existential risk from AI, you should take actions that maximize the acceleration of Deep Learning models (aside from accelerating new types of general purpose compute hardware). This means training the largest models you can, optimizing algorithms for Deep Learning, and deploying Deep Learning as widely as possible throughout the economy.
I do acknowledge that there are other side-effects of widely deploying Deep Learning AI models. These include using AI for targeted information warfare, the loss of jobs, and other forms of social disruption caused by AI. However most of these happen when we deploy AI models, not when we develop them. Furthermore, it is likely that the benefits of Deep Learning AI models vastly outweigh the drawbacks.
As a policy proposal, I would recommend “full steam ahead” on training and researching Deep Learning Models combined with “careful but rapid” deployment of these models in the economy.
Subtle McGenius imprisoned next to Mathy McEngineer and his rocket