This piece, which predates ChatGPT, is no longer endorsed by its author.
Eliezer's recent discussion on AGI alignment is not optimistic.
I consider the present gameboard to look incredibly grim... We can hope there's a miracle that violates some aspect of my background model, and we can try to prepare for that unknown miracle
For this post, instead of debating Eliezer's model, I want to pretend it's true. Let's imagine we've all seen satisfactory evidence for the following:
- AGI is likely to be developed soon*
- Alignment is a Hard Problem. Current research is nowhere close to solving it, and this is unlikely to change by the time AGI is developed
- Therefore, when AGI is first developed, it will only be possible to build misaligned AGI. We are heading for catastrophe
How we might respond
I don't think this is an unsolvable problem. In this scenario, there are two ways to avoid catastrophe: massively increase the pace of alignment research, and delay the deployment of AGI.
Massively increase the pace of alignment research via 20x more money
I wouldn't rely solely on this option. Lots of brilliant and well-funded people are already trying really hard! But I bet we can make up some time here. Let me pull some numbers out of my arse:
- $100M per year is spent per year on alignment research worldwide (this is a guess, I don't know the actual number)
- Our rate of research progress is proportional to the square root of our spending. That is, to double progress, you need to spend 4x as much**
Suppose we spent $2B a year. This would let us accomplish in 5 years what would otherwise have taken 22 years.
$2B a year isn't realistic today, but it's realistic in this scenario, where we've seen persuasive evidence Eliezer's model is true. If AI safety is the critical path for humanity's survival, I bet a skilled fundraiser can make it happen
Of course, skillfully administering the funds is its own issue...
Slow down AGI development
The problem, as I understand it:
- Lots of groups, like DeepMind, OpenAI, Huawei, and the People's Liberation Army, are trying to build powerful AI systems
- No one is very far ahead. For a number of reasons, it's likely to stay that way
- We all have access to roughly the same computing power, within an OOM
- We're all seeing the same events unfold in the real world, leading us to similar insights
- Knowledge tends to proliferate among researchers. This is in part a natural tendency of academic work, and in part a deliberate effort by OpenAI
- When one group achieves the capability to deploy AGI, the others will not be far behind
- When one group achieves the capability to deploy AGI, they will have powerful incentives to deploy it. AGI is really cool, will make a lot of money, and the first to deploy it successfully might be able to impose their values on the entire world
- Even if they don't deploy it, the next group still might. If even one chooses to deploy, a permanent catastrophe strikes
What can we do about this?
1. Persuade OpenAI
First, let's try the low hanging fruit. OpenAI seems to be full of smart people who want to do the right thing. If Eliezer's position is true, then I bet some high status rationalist-adjacent figures could be persuaded. In turn, I bet these folks could get a fair listen from Sam Altman/Elon Musk/Ilya Sutskever.
Maybe they'll change their mind. Or maybe Eliezer will change his own mind.
2. Persuade US Government to impose stronger Export Controls
Second, US export controls can buy time by slowing down the whole field. They'd also make it harder to share your research, so the leading team accumulates a bigger lead. They're easy to impose: it's a regulatory move, so an act of Congress isn't required. There are already export controls on narrow areas of AI, like automated imagery analysis. We could impose export controls on areas likely to contribute to AGI and encourage other countries to follow suit.
3. Persuade leading researchers not to deploy misaligned AI
Third, if the groups deploying AGI genuinely believed it would destroy the world, they wouldn't deploy it. I bet a lot of them are persuadable in the next 2 to 50 years.
4. Use public opinion to slow down AGI research
Fourth, public opinion is a dangerous instrument. It'd make a lot of folks miserable, to give AGI the same political prominence (and epistemic habits) as climate change research. But I bet it could delay AGI by quite a lot.
5. US commits to using the full range of diplomatic, economic, and military action against those who violate AGI research norms
Fifth, the US has a massive array of policy options for nuclear nonproliferation. These range from sanctions (like the ones crippling Iran's economy) to war. Right now, these aren't an option for AGI, because the foreign policy community doesn't understand the threat of misaligned AGI. If we communicate clearly and in their language, we could help them understand.
What now?
I don't know whether the grim model in Eliezer's interview is true or not. I think it's really important to find out.
If it's false (alignment efforts are likely to work), then we need to know that. Crying wolf does a lot of harm, and most of the interventions I can think of are costly and/or destructive.
But if it's true (current alignment efforts are doomed), we need to know that in a legible way. That is, it needs to be as easy as possible for smart people outside the community to verify the reasoning.
*Eliezer says his timeline is "short," but I can't find specific figures. Nate Soares gives a very substantial chance of 2 to 20 years and is 85% confident we'll see AGI by 2070
**Wild guess, loosely based on Price's Law. I think this works as long as we're nowhere close to exhausting the pool of smart/motivated/creative people who can contribute
After seeing a number of rather gloomy posts on the site in the last few days, I feel a need to point out that problems that we don't currently know how to solve always look impossible. A smart guy once pointed out how silly it was the Lord Kelvin claimed "The influence of animal or vegetable life on matter is infinitely beyond the range of any scientific inquiry hitherto entered on." Kelvin just didn't know how to do it. That's fine. Deciding it's a Hard Problem just sort of throws up mental blocks to finding potential obvious solutions.
Maybe alignment will seem really easy in retrospect. Maybe it's the sort of thing that requires only two small insights that we don't currently have. Maybe we already have all the insights we need and somebody just needs to connect them together in a non-obvious way. Maybe somebody has already had the key idea, and just thought to themselves, no, it can't be that simple! (I actually sort of viscerally suspect that the lynchpin of alignment will turn out to be something really dumb and easy that we've simply overlooked, and not something like Special Relativity.) Everything seems hard in advance, and we've spent far more effort as a civilization studying asphalt than we have alignment. We've tried almost nothing so far.
In the same way that we have an existence-proof of AGI (humans existing) we also have a highly suggestive example of something that looks a lot like alignment (humans existing and often choosing not to do heroin), except probably not robust to infinite capability increase, blah blah.
The "probabilistic mainline path" always looks really grim when success depends on innovations and inventions you don't currently know how to do. Nobody knows what probability to put on obtaining such innovations in advance. If you asked me ten years ago I would have put the odds of SpaceX Starship existing at like 2%, probably even after thinking really hard about it.
When I look at the world today, it really doesn't seem like a ship steered by evolution. (Instead it is a ship steered by no one, chaotically drifting.) Maybe if there is economic and technological stagnation for ten thousand years, then maybe evolution will get back in the drivers seat and continue the long slow process of aligning humans... but I think that's very much not the most probable outcome.