First things first: this post is about timelines (i.e. what year we first get transformative AGI), not takeoff speeds (i.e. foom vs gradual takeoff).
Claim: timelines are mostly not strategically relevant to alignment research, i.e. deciding what to work on. Why? Because at any given time, it would take ~18 months to take whatever our current best idea is, implement it, do some basic tests, and deploy it. (Really it probably takes less than 6 months, but planning fallacy and all that.) If AGI takeoff is more than ~18 months out, then we should be thinking “long-term” in terms of research; we should mainly build better foundational understanding, run whatever experiments best improve our understanding, and search for better ideas. (Note that this does not necessarily mean a focus on conceptual work; a case can be made that experiments and engineering feedback are the best ways to improve our foundational understanding.)
What about strategic decisions outside of object-level research? Recruitment and training strategies for new researchers might depend on how soon our investments need to pay off; do we look for a brilliant young person who will need three or five years of technical study before they’re likely to make any important progress, or a more experienced person who can make progress right now but is probably already near their skill ceiling? How much should existing technical researchers invest in mentoring new people? Those are questions which depend on timelines, but the relevant timescale is ~5 years or less. If AGI is more than ~5 years out, then we should probably be thinking “long-term” in terms of training; we should mainly make big investments in recruitment and mentorship.
General point: timelines are usually only decision-relevant if they’re very short. Like 18 months, or maybe 5 years for relatively long-term investments. The difference between e.g. 10 years vs 30 years vs 100 years may matter a lot for our chances of survival (and the difference may therefore be highly salient), but it doesn’t matter for most actual strategic decisions.
Meta note: there's a lot of obvious objections which I expect to address in the comments; please check if anyone has posted your objection already.
In my view, there are alignment strategies that are unlikely to pay off without significant time investment, but which have large expected payoffs. For example, work on defining agency seems to fit this category.
There are also alignment strategies that have incremental payoffs, but still seem unsatisfactory. For example, we could focus on developing better AI boxing techniques that just might buy us a few weeks. Or we could discover likely takeover scenarios, and build warnings for them.
There's an analogy for this in self driving cars. If you want to ship an impressive demo right away, you might rely on a lot of messy case handling, special road markings, mapping, and sensor arrays. If you want to solve self driving in the general case, you'd probably be developing really good end to end ML models.
Yup, that's a place where I mostly disagree, and it is a crux. In general, I expect the foundational progress which matters mostly comes from solving convergent subproblems ( = subproblems which are a bottleneck for lots of different approaches). Every time progress is made on one of those subproblems, it opens up a bunch of new strategies, and the... (read more)