The slow takeoff hypothesis predicts that AGI emerges in a world where powerful but non-AGI AI is already a really big deal. Whether AI is a big deal right before the emergence of AGI determines many super basic things about what we should think our current job is. I hadn’t fully appreciated the size of this effect until a few days ago.
In particular, in a fast takeoff world, AI takeover risk never looks much more obvious than it does now, and so x-risk-motivated people should be assumed to cause the majority of the research on alignment that happens. In contrast, in a slow takeoff world, many aspects of the AI alignment problems will already have showed up as alignment problems in non-AGI, non-x-risk-causing systems; in that world, there will be lots of industrial work on various aspects of the alignment problem, and so EAs now should think of themselves as trying to look ahead and figure out which margins of the alignment problem aren’t going to be taken care of by default, and try to figure out how to help out there.
In the fast takeoff world, we’re much more like a normal research field–we want some technical problem to eventually get solved, so we try to solve it. But in the slow takeoff world, we’re basically in a weird collaboration across time with the more numerous, non-longtermist AI researchers who will be in charge of aligning their powerful AI systems but who we fear won’t be cautious enough in some ways or won’t plan ahead in some other ways. Doing technical research in the fast takeoff world basically just requires answering technical questions, while in the slow takeoff world your choices about research projects are closely related to your sociological predictions about what things will be obvious to whom when.
I think that these two perspectives are extremely different, and I think I’ve historically sometimes had trouble communicating with people who held the slow takeoff perspective because I didn’t realize we disagreed on basic questions about the conceptualization of the question. (These miscommunications persisted even after I was mostly persuaded of slow takeoffs, because I hadn’t realized the extent to which I was implicitly assuming fast takeoffs in my picture of how AGI was going to happen.)
As an example of this, I think I was quite confused about what genre of work various prosaic alignment researchers think they’re doing when they talk about alignment schemes. To quote a recent AF shortform post of mine:
Something I think I’ve been historically wrong about:
A bunch of the prosaic alignment ideas (eg adversarial training, IDA, debate) now feel to me like things that people will obviously do the simple versions of by default. Like, when we’re training systems to answer questions, of course we’ll use our current versions of systems to help us evaluate, why would we not do that? We’ll be used to using these systems to answer questions that we have, and so it will be totally obvious that we should use them to help us evaluate our new system.
Similarly with debate--adversarial setups are pretty obvious and easy.
In this frame, the contributions from Paul and Geoffrey feel more like “they tried to systematically think through the natural limits of the things people will do” than “they thought of an approach that non-alignment-obsessed people would never have thought of or used”.
It’s still not obvious whether people will actually use these techniques to their limits, but it would be surprising if they weren’t used at all.
I think the slightly exaggerated slogan for this update of mine is “IDA is futurism, not a proposal”.
My current favorite example of the thinking-on-the-margin version of alignment research strategy is in this comment by Paul Christiano.
I support that thinking-on-the-margin alignment research version is crucial and it is one of the biggest areas of opportunity to increase the probability of success. Based on the seemingly current low probability, possibly at least it is worth trying.
In the general public context, one of the premises is about if they could have benefit to the problem. In my intuition it really seems that AI alignment should be more debated in Universities and Technology. The current lack of awareness is concerning and unbelievable.
We should evaluate more the outcomes of raising awareness. Taking into account all the options: a general(public), partial(experts) awareness strategy and the spectrum and variables in between. It seems that current Safe Alignment leaders have high motives to not focus enough on the expansion of awareness or even to not take the strategy even as possibly useful. I believe this motives should not be fixed but be more debated and not determined as too hard to implement.
We can't assume that if someone is capable to solve Safe Alignment, he/she/they would also be aware of the problem. It seems probable that if someone is capable of solving safe alignment, he/she/they currently don't understand the true magnitude of the problem. In that probable case, a needed step on the success path is in he/she/they understanding the problem. And we can be crucial with that. I understand that with this strategy, as in many safe alignment strategies, the probability of it reducing instead of increasing our success must be highly evaluated.
In the current Alignment research context, possibly there is also positive opportunity from taking more thinking-on-the-margin approaches. The impact of present and future AI systems to AGI/Safe Alignment is very likely of very high importance, and more so compared to its current focus. Because these systems are very likely to shorten the timelines("how much?" is important and currently ambiguous). Seems that we are not evaluating enough the probable crucial impact of current deep models to the problem, but I'm glad the idea is growing.
(paulfchristiano, 2022) states "AI systems could have comparative disadvantage at alignment relative to causing trouble, so that AI systems are catastrophically risky before they solve alignment." I support that is one of the most important issues. Because, if AI systems are capable of improving safe AI alignment research, they will very likely be even more capable of improving non-safe/default AI alignment and probably Superintelligence creation research. This means that the probable most crucial technology in the Superintelligence birth lowers the probability of safe alignment. So two crucial questions are: How to fight this? and more essentially: How and how much can the current and near future AI Systems improve AGI creation?.
Now I will propose a polemic AI Safe Alignment thinking-on-the-margin tactic. The (I argue highly) probable ideation of new/different/better AI Safe Alignment strategies by AI Safe Alignment researchers from taking advantage of Stimulants and Hallucinogens. We definitely are in a situation where we must take any advantage we can. Non-ordinary states of consciousnesses are very highly worth trying because of the almost none risks involved. (Also with nootropics, but I'm almost not familiar with them).
Finally I will share what I believe currently should be the most important issue on all versions of alignment research and is on top of all previous ideas: If trying to Safely Align will almost certainly not solve our x-risk as EY states in "Miri announces new "Death with dignity strategy"". Then what it will have achieved is only higher s-risk probabilities. (THANK YOU for the Info hazards T.T). So one option is to aim to shorten the x-risk timeline if that reduces the probability of s-risks. Helping to build the Superintelligence asap.
Or to shift all the strategy to lowering s-risks. This is specially and highly relevant to us because we have a higher probability of s-risk (thanks e.e). So we should focus on the issues that increased our s-risk probabilities.