In METR: Measuring AI Ability to Complete Long Tasks found a Moore's law like trend relating (model release date) to (time needed for a human to do a task the model can do).
Here is their rationale for plotting this.
Current frontier AIs are vastly better than humans at text prediction and knowledge tasks. They outperform experts on most exam-style problems for a fraction of the cost. With some task-specific adaptation, they can also serve as useful tools in many applications. And yet the best AI agents are not currently able to carry out substantive projects by themselves or directly substitute for human labor. They are unable to reliably handle even relatively low-skill, computer-based work like remote executive assistance. It is clear that capabilities are increasing very rapidly in some sense, but it is unclear how this corresponds to real-world impact.
It seems that AI alignment research falls into this. The LLMs clearly have enough "expertise" at this point, but doing any kind of good research takes an expert a lot of time, even when it is purely on paper.
It seems therefore that we could use Metr's law to predict when AI will be capable of alignment research. Or at least when it could substantially help.
My question is what time t does "automatically do tasks that humans can do in t" let us do enough research to solve the alignment problem?
(Even if you're not a fan of automating alignment, if we do make it to that point we might as well give it a shot!)
8 hours of clock time for an expert seems likely to be enough to do anything humans can do; people rarely productively work in longer chunks than that, and as long as we assume models are capable of task breakdown and planning, (which seems like a non trivial issue, but an easier one than the scaling itself,) that should allow it to parallelize and serialize chucks to do larger human-type tasks.
But it's unclear alignment can be solved by humans at all, and even if it can, of course, there is no reason to think these capabilities would scale as well or better for alignment than for capabilities and self-improvement, so this is not at all reassuring to me.