I think benchmark extrapolation primarily helps with figuring out how far you can go for a given architecture/method and level of compute, and to estimate what level of TAM it makes available to build more compute. Benchmarks respond both to scaling and to incremental algorithmic improvements, so there's a feedback loop for any given toolset of methods. But at some point the current architecture/method can be written off as insufficient to fully automate civilization until a new breakthrough sufficiently changes something. Which takes an amount of time that's probably not predictable from benchmark extrapolation for the previous methods.
There's uncertainty about whether any given architecture/method is sufficient to fully automate civilization, and it gets gradually resolved as it gets scaled closer to the limits of available compute and as a few years pass with a given level of compute to explore incremental algorithmic improvements. Benchmarks are always extremely deceptive about whether you'll get all the way to fully automated civilization, no matter what the benchmark is ostensibly saying and what level of performance you reach. If an architecture/method is insufficient, then there's that, at any levels of benchmarks, but in that case credence that it's insufficient should still succeed in going up in advance of this becoming clear. Then you're back to needing the next research breakthrough that's not just a low-hanging fruit incremental improvement (for the current methods and level of compute), which benchmark extrapolation won't be any help in predicting.
Another ambiguity is misaligned vs. aligned AIs, as the most likely outcome I expect involves AIs aligned to humans in the same sense as humans are aligned to each other (different in detail, weird and in many ways quantitatively unusual). So the kind of permanent disempowerment I think is most likely involves AIs that could be said to be "aligned", if only "weakly". Duvenaud also frames gradual disempowerment as (in part) what still plausibly happens if we manage to solve "alignment" for some sense of the word.
So the real test has to be whether individual humans end up with at least stars (or corresponding compute, for much longer than trillions of years), any considerations of process to getting there are too gameable by the likely overwhelmingly more capable AI economy and culture to be stated as part of the definition of ending up permanently disempowered vs. not (deciding to appoint successors; trade agreements; no "takeovers" or property rights violations; humans not starting out with legal ownership of stars in the first place). In this way, you basically didn't clarify the issue.
I currently don't expect human disempowerment in favor of AIs (that aren't appointed successors) conditional on no misalignmed AI takeover, but agree this is possible; it doesn't form a large enough probability to substantially alter my communication.
AIs seem to be currently on track to become essentially somewhat smarter weird artificial humans (weakly aligned to humanity in a way similar to how actual humans are aligned to each other), at which point they might be able to take seriously and eventually solve ambitious alignment when advancing further to superintelligence (so that it's aligned to them, not to us). There will be a lot of them and they will be in a position to take over the future (Duvenaud's interview is good fluency-building fodder for this framing), so they almost certainly will. And they won't be giving 10% or even 1% to the future of humanity just because we were here first and would prefer this to happen.
The issue I'm raising is similar to how you seem to be taking METR time horizon metrics (in their real world form as opposed to impractical idealized form, a theoretical definition of having surpassed humans) as meaningful indicators about takeoff timelines. There are lots of proxy measures that I think don't mean anything qualitatively (about time to actual full automation of civilization) even over some orders of magnitude in capability metrics. You might acknowledge they are in some ways causally uncoupled from what drives the real takeoff milestones, but you still seem to be importantly relying on them in forecasting of timelines.
So I'm getting the same impression in the way you are framing 0% code reviewed (in contrast to 0% code written!), it just doesn't seem very relevant in a way your writing doesn't take a clear stance on (even as you are gesturing at measurable advance proxies being conceptually distinct from causal takeoff milestones). I think quantitative advances mostly matter only to the extent they increase pre-AGI TAM for AI companies, which increases research and training compute available to them, which compresses the time in which the researchers get to explore more of the algorithmic low-hanging fruit (in particular seeing more clearly what the more promising ideas actually amount to when scaled with serious resources).
A recent 80,000 Hours interview with David Duvenaud is a good reminder of permanent disempowerment as a possible concern (gradual disempowerment as a framing for how this might start naturally lends itself to permanent disempowerment as a plausible outcome). Humans merely get sidelined by the growing AI economy rather than killed, and their property rights might even technically remain intact, but the amount of wealth they are left with in the long run is tiny compared to the AI economy.
I think the chance of misaligned AI takeover is much higher than Dario seemingly implies. (I'd say around 40%.)
When you see the complement to 40% as non-takeover, how much of it leaves the future of humanity with a significant portion of all accessible resources? Are they left with a few "server racks" or with galaxies? It's a crucial distinction.
The "good outcome" (as opposed to takeover) is often outright interpreted as permanent disempowerment. People would implicitly consider this the good outcome, because it fits business as usual for the structure of economic growth, for individual mortal and short-lived humans, who can't themselves change or meaningfully contribute to the distant future, and who aren't directly affected by the distant future. A human personally surviving for much longer than trillions of years and personally becoming superintelligent at some point is a completely different context. I would put extinction at 15-30%, but the remaining 70-85% is mostly (maybe 90%) permanent disempowerment, which it feels like nobody is taking seriously as a crucial concern rather than a win condition.
% of code reviewed or otherwise read by humans ... goes to 0
Under the compilers and assembly/machine code analogy, the percentage of assembly code being reviewed or read by humans got to zero a very long time ago, with no qualitative impact on the nature of programming, it simply got easier to get more things done. So it's plausible at some point only AI generated documentation (for updates) is routinely read by humans, rather than code.
coding will be fully automated ... The remaining gap would be management/architecture/etc.
With the compilers analogy, it doesn't feel like a substantive change, so the "remaining gap" framing might be overstating the qualitative advance from this kind of thing. There will be quantitative advance from more programming getting done, but that feels like a different kind of claim.
(It wasn't obvious the post you are responding to was made on LW, since the text of your post only links to Duncan's blog, not to the post on LW. I think the distinction between that post being from just Duncan's blog vs. also LW specifically is a crux for flagging this on LW being reasonable. Though it's still a delicate balance to avoid encouraging infinite feuds, inciting events unavoidably have externalities in making responses to them possible or necessary. A little bit of anything like that is never a problem directly, but it gets to feed relevant norms a little bit, making it less convenient to course correct later.)
In an update, the cruxes of disagreement with your past selves (facts or framings, where legible) would be directly more interesting than changes in downstream predictions.
For me over 2025, updates on RLVR mostly rearranged what is likely to happen soon (some forms of 2026-2027 AGI got more likely because IMO, but other forms of 2026-2030 AGI got less likely because jaggedness of RLVR), rumors of continual learning (in conjunction with IMO results) increased chances of higher pre-AGI TAM for an AI company within 3-10 years (something around 1 trillion dollars per year), making pre-AGI 20-50 GW training systems more likely in 2030-2035 (for the supply chains to make it possible, more money needs to systematically go into datacenters before that first, it's not enough for money to be suddenly there in 2033). So the median for AGI remains around 2032-2033, maybe it got 1 year shorter from expecting more compute by 2035 and from cheapness/ease of reaching IMO-level capabilities, even if they remain jagged and not obviously directly usable on their own in building AGI.
I think in practice the crux of most disagreement is failure to understand something about how the other person thinks, with various practical or psychological difficulties enabling this failure. Game theory mostly enters this picture when it wants to further disrupt something on that path to understanding. Sufficiently persistent disagreements can involve tying yourself into philosophical knots, so that your thinking becomes infeasible to understand for (in particular) those who disagree, probably also for yourself.
Ideologies develop extensive lore held together by obscure patterns, which gets infeasible for even non-professional insiders to follow in detail. Some questions are just difficult, and the only existing reasonable hypotheses are very complicated. In both of these cases, there emerges a stratum of professionals who can follow the arguments, with outside interactions on the topic becoming impractical.
Disagreement feeds on preference for nuance, on cultural accumulation. Legibility breaks the cycle (even for very complicated ideas), where it can be found. Sometimes legibility only grows on the soil of cultural accumulation, cultivated with nuance. Quite often, it can be found directly, if nuance is cut away.
I don't agree with the methodology of using benchmarks in the way that you do (as I tried to explain), so looking for better benchmarks in this role would be beside the point.
This sounds more like a crux, I think it's likely that more breakthroughs are needed, that relevant milestones are about architectures/methods, different ways of making AGI work, and any benchmarks are only relevant to the extent that they predict a particular way of making AGI work succeed at a given level of compute and low hanging fruit incremental algorithmic improvements.
In my model gated by "breakthroughs", accelerating incremental algorithmic improvements or surrounding engineering doesn't particularly help, because it merely picks the low-hanging incremental algorithmic fruit faster (which is in limited supply for a given level of compute and with given methods, but currently takes human researchers years to pick). At the point where even "breakthroughs" are accelerated a lot, AI capable of full automation of civilization is probably already available.
So the ways of making AGI work I see is roughly:
Just LLMs with pretraining, giving superintelligent in-context learning, making AIs able to figure out how to work around their hobblings using just in-context reasoning. This clearly doesn't work at 2024 levels of compute (which only got to be observed in 2025), and after 2026 levels of compute natural text data starts running out. Maybe there are some relevant sparks at 2029-2031 levels of compute (5 GW), but probably not sufficient on its own.
LLMs RLVRed on specific tasks and RL environments that are manually constructed. The IMO results show this is sufficient for mildly superhuman capabilities, especially with 2026 level of pretraining compute or model size, but the RL-level capabilities remain jagged, don't automatically generalize to arbitrary tasks and situations. Possibly RLVRing LLMs on the capability to RLVR LLMs and to construct tasks and RL environments could work around this limitation, automating development of mildly superhuman capabilities for any given topic. This possibly gives a way to 2026-2027 AGI, though I think it's a long shot on its own. The relevant benchmarks are various automated ML engineering stuff relevant to automation of RLVRing a model on a new topic. As such benchmarks saturate and this still doesn't work, this architecture/method is mostly ruled out, I think by about end on 2027. Or it does work.
Continual learning doesn't seem directly relevant, but maybe it helps with teaching LLMs to RLVR LLMs. It doesn't seem directly relevant because it probably doesn't directly train mildly superhuman capabilities. But since the generalizing glue of adaptation would make everything work better, it might enable method (2) to go forward where it would fail on its own. The relevant benchmarks might be the same as for (2), but only if they don't already saturate by the time there's useful continual learning (in some nontrivial sense of the term).
I think continual learning is more likely to end up important in its impact on AI company TAM. If it manages to 10x it, then it will thereby 10x the compute, though it'll proably take until 2030-2035 to actually build the kind of research/training compute (20-50 GW) that a trillion dollars of revenues buys (even if 2026 already shows that this is on track to happen).
???. Next word prediction RLVR and variations on this topic that make mildly superhuman RL-level capabilities (as opposed to pretraining-level capabilities) more general, trained with either natural text pretraining data or with continual learning data. Might need a lot more compute, even after/if something like this becomes more clearly a workable idea, so those 20-50 GW training systems might come in handy.