Great post! Glad to see more discussion of the implications of short timelines on impactful work prioritization on LW.
These last two categories—influencing policy discussions and introducing research agendas—rely on social diffusion of ideas, and this takes time. With shorter timelines in mind, this only make sense if your work can actually shape what other researchers do before AI capabilities advance significantly.
Arguably this is not just true of those two avenues for impactful work, but rather all avenues. If your work doesn't cause someone in a position of power to make a better decision than they otherwise would (e.g., implement this AI control solution on a production model, appoint a better-informed person to lead such-and-such an AI project, care about AI safety because they saw a scary demo, etc.), it's unlikely to matter. Since timelines are short and governments are likely to get involved soon, only a highly concentrated range of actors have final sway over decisions that matter.
Is this post an argument for accelerationism? Because the work that it is always timely to do right now is work that progresses the march towards the AGI that is to obsolete all the other work (if it doesn't kill us). Just as in the interstellar spaceship analogy, the timely work before the optimum time to launch is work on better propulsion.
Hmm. I think it's an argument for the mindset of preparing for a final safety sprint right near the end.
Something I've talked about with others as the idea of, "our best chance to make progress on safety will be right at the moment before we lose control because the AI is too strong. If we can concentrate our work then, and maybe focus on prepping for that time now, we can do a fast sprint and save the world at that critical juncture."
I feel torn about this, since I work that we might mistime it and overshoot the optimal point without realizing it.
This post isn't exactly taking about this dynamic, but it kinda fits the rough pattern. I think there's something to the point being made, but also, I see danger in taking it too far.
tl;dr: LLMs rapidly improving at software engineering and math means lots of projects are better off as Google Docs until your AI agent intern can implement them.
Implementation keeps getting cheaper
Writing research code has gotten a lot faster over the past few years. Since 2021 and OpenAI Codex, new models and tools such as Cursor built around them have saved myself more and more time on coding every year.
This trend is accelerating fast: AI agents using Claude-3.5-Sonnet and o1-preview can do tasks that take ML researchers up to 2 hours of coding. This is without considering newer models such as o3, which do 70% on SWE-bench out of the box.
Yet this progress remains somewhat concentrated in implementation: progress on “soft” skills like idea generation has, as far as I can tell, been slower.
I’ve come to believe that, if you work in technical AI safety research, this trend is a very practical consideration that should be the highest order bit in your decisions on what to spend time on.
Hence, my New Year's resolution is the following: Do not work on a bigger project if there is not a clear reason for doing it now. Disregarding AGI timelines [1] 1, the R&D acceleration is a clear argument against technical work where the impact does not critically depend on timing.
When later means better
The wait calculation in space travel is a cool intuition pump for today’s AI research. In short, when technological progress is sufficiently rapid, later projects can overtake earlier ones.
For instance, a space probe sent to Alpha Centauri in 2025 will likely reach there after the one sent in 2040, due to advances in propulsion technology. Similarly, starting a multi-year LLM training run in 2022 would not have yielded a better model than starting a much shorter training run in 2024. [2]
The above examples involve long feedback loops, and it’s clear why locking in too early has issues: path dependence is high, and the tech improves quickly.
Now, my research (and likely your research too) has had much faster feedback loops, and path dependence in research projects is not that high if LLMs can refactor the codebase. However, it still does not make marginal sense to start some projects now if those can be done later.
If you work in AI safety, a common issue is having a lot of half-baked project ideas that you'd like to do and too little time to try them all. The silver lining of fast AI R&D improvement is that many of these ideas will become much easier to implement in the future. Thus, strategic timing — deciding which projects truly benefits from being done now — has become a crucial research meta-skill.
Did I do well in 2024?
To get a grasp on what this means in practice, I decided to go through the papers I contributed to in 2024, in chronological order, and analyze of whether it is good that this paper was done at the time, versus later, assuming all things are equal. [3]
The core issue working against this paper is that the documented attack/defense dynamic is highly dependent on the capabilities of the models used; and since we used models that are now outdated, I doubt the findings will be robustly useful for prompt injection / extraction research. The same paper could be done much more efficiently in 2026, with more relevant results.
All the papers above are accepted (or likely to be accepted) in ML venues on first submission, and some are heavily cited, so there is some evidence the papers are considered good by conventional standards. Yet half of them were mistimed.
The above analysis ignores other reasons a paper might not be counterfactually impactful, such as parallel discovery, scooping other researchers (or being scooped), or even the mundane “this research direction didn’t end up being useful after all”. For example, another group did a big chunk of the Stealing paper independently and published a week later; and several teams worked on similar concepts to the Refusal paper before and after us.
On the other hand, a key product of research is improving own expertise in the papers; I'm definitely a stronger researcher now than a year ago, and it’s hard to gain experience if I hadn't gotten my hands dirty on some of the above work.
Looking back, I think my efforts look better than I expected, given that last year I did not optimize for the consideration of this post at all. But it’s far from perfect. If you’re doing AI safety research, I'd encourage you to do a similar audit of your own work.
Themes for temporally privileged work
So, why now and not later? The previous section has a few themes for work that can be worthwhile doing as soon as possible rather than waiting:
In addition, I can recall more reasons:
These last two categories—influencing policy discussions and introducing research agendas—rely on social diffusion of ideas, and this takes time. With shorter timelines in mind, this only make sense if your work can actually shape what other researchers do before AI capabilities advance significantly. If you do not have existing credibility or a concrete plan how it reaches the right audience, it might not be worth it.
In fact, technically motivated research faces a similar challenge: unless you're working at a leading AGI lab or in a position to influence one, your brilliant insights might never make it into frontier models. [4]
As for research that is not worth doing now, I do have some opinions, but I think better advice is to just apply this mindset on a case by case basis. [5] Pick some reasonable prior, say 50% reduction of total coding time per year; and before starting any significant technical work, write down a brief description of what you're trying to achieve and make an explicit case for why it needs to be done this year rather than in 25% of the engineering time in two years.
Thanks to Nikola Jurkovic for reading a draft of this post.
And if you are indeed operating on a tight schedule, Orienting to 3-year AGI timelines agrees:
The Longest Training Run analysis from Epoch indicates that hardware and algorithmic improvements incentivize shorter training runs.
This analysis reflects my contributions and views alone, not those of coauthors, especially on papers where I am not a joint first author.
Again from Orienting to 3-year AGI timelines:
Consider also what Gwern had to say and don't let this quote describe you: