The linked post is work done by Tom Adamczewski while at FHI. I think this sort of expository and analytic work is very valuable, so I'm cross-posting it here (with his permission). Below is an extended summary; for the full document, see his linked blog post.
Many people now work on ensuring that advanced AI has beneficial consequences. But members of this community have made several quite different arguments for prioritising AI.
Early arguments, and in particular Superintelligence, identified the “alignment problem” as the key source of AI risk. In addition, the book relies on the assumption that superintelligent AI is likely to emerge through a discontinuous jump in the capabilities of an AI system, rather than through gradual progress. This assumption is crucial to the argument that a single AI system could gain a “decisive strategic advantage”, that the alignment problem cannot be solved through trial and error, and that there is likely to be a “treacherous turn”. Hence, the discontinuity assumption underlies the book’s conclusion that existential catastrophe is a likely outcome.
The argument in Superintelligence combines three features: (i) a focus on the alignment problem, (ii) the discontinuity assumption, and (iii) the resulting conclusion that an existential catastrophe is likely.
Arguments that abandon some of these features have recently become prominent. They also generally tend to have been made in less detail than the early arguments.
One line of argument, promoted by Paul Christiano and Katja Grace, drops the discontinuity assumption, but continues to view the alignment problem as the source of AI risk. Even under more gradual scenarios, they argue that, unless we solve the alignment problem before advanced AIs are widely deployed in the economy, these AIs will cause human values to eventually fade from prominence. They appear to be agonistic about whether these harms would warrant the label “existential risk”.
Moreover, others have proposed AI risks that are unrelated to the alignment problem. I discuss three of these: (i) the risk that AI might be misused, (ii) that it could make war between great powers more likely, and (iii) that it might lead to value erosion from competition. These arguments don’t crucially rely on a discontinuity, and the risks are rarely existential in scale.
It’s not always clear which of the arguments actually motivates members of the beneficial AI community. It would be useful to clarify which of these arguments (or yet other arguments) are crucial for which people. This could help with evaluating the strength of the case for prioritising AI, deciding which strategies to pursue within AI, and avoiding costly misunderstanding with sympathetic outsiders or sceptics.
I agree that slower makes the problem easier, but disagree about how slow is slow enough. I have pretty high confidence that a 200-year takeoff is slow enough; faster than that, I become increasingly unsure.
For example: one scenario would be that there are years, even decades, in which worse and worse AGI accidents occur, but the alignment problem is very hard and no one can get it right (or: aligned AGIs are much less powerful and people can't resist tinkering with the more powerful unsafe designs). As each accident occurs, there's bitter disagreement around the world about what to do about this problem and how to do it, and everything becomes politicized. Maybe AGI research will be banned in some countries, but maybe it will be accelerated in other countries, on the theory that (for example) smarter systems and better understanding will help with alignment. And thus there would be more accidents and bigger accidents, until sooner or later there's an existential catastrophe.
I haven't thought about the issue super-carefully ... just a thought ...