Summary: Aligning a single powerful AI is not enough: we're only safe if no-one, ever, can build an unaligned powerful AI. Yudkowsky tried to solve this with the pivotal act: the first aligned AI does something (such as melting all GPUs) which makes sure no unaligned AIs can ever get built, by anyone. However, the labs are currently apparently not aiming to implement a pivotal act. That means that aligning an AGI, while creating lots of value, would not reduce existential risk. Instead, global hardware/data regulation is what's needed to reduce existential risk. Therefore, those aiming to reduce AI existential risk should focus on AI Regulation, rather than on AI Alignment.
Epistemic status: I’ve been thinking about this for a few years, while working professionally on x-risk reduction. I think I know most literature on the topic. I have also discussed the topic with a fair number of experts (who in some cases seemed to agree, and in other cases did not seem to agree).
Thanks to David Krueger, Matthijs Maas, Roman Yampolskiy, Tim Bakker, Ruben Dieleman, and Alex van der Meer for helpful conversations, comments, and/or feedback. These people do not necessarily share the views expressed in this post.
This post is mostly about AI x-risk caused by a take-over. It may or may not be valid for other types of AI x-risks. This post is mostly about the ‘end game’ of AI existential risk, not about intermediate states.
AI existential risk is an evolutionary problem. As Eliezer Yudkowsky and others have pointed out: even if there are safe AIs, those are irrelevant, since they will not prevent others from building dangerous AIs. Examples of safe AIs could be oracles or satisficers, insofar as it turns out to be possible to combine these AI types with high intelligence. But, as Yudkowsky would put it: “if all you need is an object that doesn't do dangerous things, you could try a sponge”. Even if a limited AI would be a safe AI, it would not reduce AI existential risk. This is because at some point, someone would create an AI with an unbounded goal (create as many paperclips as possible, predict the next word in the sentence with unlimited accuracy, etc.). This is the AI that would kill us, not the safe one.
This is the evolutionary nature of the AI existential risk problem. It is described excellently by Anthony Berglas in his underrated book, and more recently also in Dan Hendrycks’ paper. This evolutionary part is a fundamental and very important property of AI existential risk and a large part of why this problem is difficult. Yet, many in AI Alignment and industry seem to focus on only aligning a single AI, which I think is insufficient.
Yudkowsky aimed to solve this evolutionary problem (the fact that no-one, ever, should build an unsafe AI) with the so-called pivotal act. An aligned superintelligence would not only not kill humanity, it would also perform a pivotal act, the toy example being to melt all GPUs globally, or, as he later put it, to subtly change all GPUs globally so that they can no longer be used to create an AGI. This would be the act that would actually save humanity from extinction, by making sure no unsafe superintelligences are created, ever, by anyone (it may be argued that melting all GPUs, and all other future hardware that could run AI, would need to be done indefinitely by the aligned superintelligence, else even a pivotal act may be insufficient).
The concept of a pivotal act, however, seems to have gone thoroughly out of fashion. None of the leading labs, AI governance think tanks, governments, etc. are talking or, apparently, thinking much about it. Rather, they seem to be thinking about things like non-proliferation and several types of regulation, to make sure powerful AI won't fall into the wrong hands. This could mean anyone who could run it, either on purpose or by mistake, without safety measures. I would call such a solution, specifically any solution that has the capacity to limit any actor’s access to advanced AI for any period of time, AI Regulation.
This solution, which appears to have gotten mainstream, has important consequences:
- Even if we would have solved alignment, we would still need AI Regulation, since otherwise it would be possible for non-safe actors, which abound, to run superintelligence without appropriate safety, risking a take-over.
- If we have AI Regulation anyway, we could also use it to deny everyone access to advanced AI, including the leading labs, instead of almost everyone. This equals an AI Pause.
- If we can deny everyone access to advanced AI, and we can keep on doing that, we have solved AI existential risk, also without ever solving AI Alignment.
- Successfully aligning a superintelligence without performing a pivotal act would hardly change the regulations that would need to be in place, since they would still be needed for all others than a handful of labs deemed safe.
Therefore, without a pivotal act, what keeps us safe is regulation. One might still want to align a superintelligence to use its power, but not to prevent existential risk. Using a superintelligence’s power may of course be a valid reason to pursue alignment: it could skyrocket our economy, create abundance, cure disease, increase political power, etc. Although net positivity of these enormous, and enormously complex, transformations may be hard to prove in advance, these could certainly be legitimate reasons to work on alignment. However, those of us interested in preventing existential risk, as opposed to building AI, should - in this scenario - be focusing on regulation, not on alignment. The latter might also be left to industry, as well as the burden of proof that the resulting aligned AIs are indeed safe.
Moving beyond this scenario of AI Regulation, there is one more option to solve the full evolutionary problem of AI existential risk. Some think that aligned superintelligences could successfully and indefinitely protect us from unaligned superintelligences. This option, which I would call a positive offense/defense balance, would be a third way, next to alignment + pivotal act and lasting regulation, to prevent human extinction in the longer term. Most people do not seem to think that this would be realistic, however (with notable exceptions).
These three ways of solving the evolutionary nature of AI existential risk (AI alignment + pivotal act, AI regulation, defense > offense) might not be the complete set of solutions for the evolutionary problem of AI existential risk, and there are intersections between the three. The pivotal act might be seen as a (very restrictive, and illegal) type of winning the offense/defense balance. A pivotal act carried out by a state actor might be seen as an extreme (and again illegal) way of implementing AI regulation. Types of AI (hardware) regulation may be possible where the state actors implementing the regulation are aided by aligned AIs, making their implementation somewhat similar to a pivotal act (that would in this case probably be legal). And certain types of regulation can perhaps make it more likely that we win the offense/defense balance.
I think research should be carried out that aims for a complete set of solutions to the evolutionary problem of AI existential risk. I would expect such research to come up with more options than these three, and/or with more hybrid options in between these three, which may point to new, fruitful ways of reducing AI existential risk.
As long as we assume that only three solutions exist to the evolutionary nature of AI existential risk, it is important to realize that all three seem difficult. Also, it is hard to quantify the likeliness of each option. Therefore, placing bets on any of these three could be worthwhile.
My personal bet, however, is that offense will unfortunately trump defense, and that the chance that alignment will be solved before a superintelligence with takeover capabilities is developed and this aligned superintelligence will carry out a successful pivotal act, is smaller than the chance that we will be able to coordinate successfully and implement good enough hardware or data regulation, especially if the current trend of increasing public awareness of AI existential risk continues. This implies that working on regulation of the type that could globally and indefinitely limit access to advanced AI for all actors and for as long as necessary, should be the highest existential priority, more so than working on alignment.
Thanks Oliver for adding that context, that's helpful.