In METR: Measuring AI Ability to Complete Long Tasks found a Moore's law like trend relating (model release date) to (time needed for a human to do a task the model can do).
Here is their rationale for plotting this.
Current frontier AIs are vastly better than humans at text prediction and knowledge tasks. They outperform experts on most exam-style problems for a fraction of the cost. With some task-specific adaptation, they can also serve as useful tools in many applications. And yet the best AI agents are not currently able to carry out substantive projects by themselves or directly substitute for human labor. They are unable to reliably handle even relatively low-skill, computer-based work like remote executive assistance. It is clear that capabilities are increasing very rapidly in some sense, but it is unclear how this corresponds to real-world impact.
It seems that AI alignment research falls into this. The LLMs clearly have enough "expertise" at this point, but doing any kind of good research takes an expert a lot of time, even when it is purely on paper.
It seems therefore that we could use Metr's law to predict when AI will be capable of alignment research. Or at least when it could substantially help.
My question is what time t does "automatically do tasks that humans can do in t" let us do enough research to solve the alignment problem?
(Even if you're not a fan of automating alignment, if we do make it to that point we might as well give it a shot!)
To apply METR's law we should distinguish conceptual alignment work from well-defined alignment work (including empirics and theory on existing conjectures). The METR plot doesn't tell us anything quantitative about the former.
As for the latter, let's take interpretability as an example: We can model uncertainty as a distribution over the time-horizon needed for interpretability research e.g. ranging over 40-1000 hours. Then, I get 66% CI of 2027-2030 for open-ended interp research automation--colab here. I've written up more details on this in a post here.