AI x-risk, approximately ordered by embarrassment
Advanced AI systems could lead to existential risks via several different pathways, some of which may not fit neatly into traditional risk forecasts. Many previous forecasts, for example the well known report by Joe Carlsmith, decompose a failure story into a conjunction of different claims, and in doing so risk missing some important dangers. ‘Safety’ and ‘Alignment’ are both now used by labs to refer to things which seem far enough from existential risk reduction that using the term ‘AI notkilleveryoneism’ instead is becoming increasingly popular among AI researchers who are particularly focused on existential risk. This post presents a series of scenarios that we must avoid, ranked by how embarrassing it would be if we failed to prevent them. Embarrassment here is clearly subjective, and somewhat unserious given the stakes, but I think it gestures reasonably well at a cluster of ideas which are important, and often missed by the kind of analysis which proceeds via weighing the incentives of multiple actors: * Sometimes, easy problems still don’t get solved on the first try. * An idea being obvious to nearly everyone does not mean nobody will miss it. * When one person making a mistake is sufficient for a catastrophe, the relevant question is not whether the mistake will be obvious on average, but instead whether it will be obvious to everyone with the capacity to make it. The scenarios below are neither mutually exclusive nor collectively exhaustive, though I’m trying to cover the majority of scenarios which are directly tackled by making AI more likely to try to do what we want (and not do what we don’t). I’ve decided to include some kinds of misuse risk, despite this more typically being separated from misalignment risk, because in the current foundation model paradigm there is a clear way in which the developers of such models can directly reduce misuse risk via alignment research. Many of the risks below interact with each other in ways which are diffi
Thanks for putting this list together. It seems like a great resource for people interested in the field and looking for somewhere to start.
I think many of the projects described within it seem likely to be within scope for this request for proposals. If you're reading this and are excited about taking on one of these projects but need funding to do so, I encourage you to apply!