AI danger is not about AI, it's about governance. A sane civilization would be able to robustly defer and then navigate AI danger when it's ready. AI is destabilizing, and while aligned AI (in a broad sense) is potentially a building block for a competent/aligned civilization (including human civilization), that's only if it's shaped/deployed in a competent/aligned way. Uploads are destabilizing in a way similar to AI (since they can be copied and scaled), even though they by construction ensure some baseline of alignment.
Intelligence amplification for biological humans (that can't be copied) seems like the only straightforward concrete plan that's not inherently destabilizing. But without highly speculative too-fast methods it needs AI danger to be deferred for a very long time, with a ban/pause that achieves escape velocity (getting stronger rather than weaker over time, for example by heavily restricting semi manufacturing capabilities). This way, there is hope for a civilization that eventually gets sufficiently competent to navigate AI danger, but the premise of a civilization sufficiently competent to defer AI danger indefinitely is damning.
if your effort to constrain your future self on day one does fail, I don't think there's a reasonable decision theory that would argue you should reject the money anyway
That's one of the things motivating UDT. On day two, you still ask what global policy you should follow (that in particular encompasses your actions in the past, and in the counterfactuals relative to what you actually observe in the current situation). Then you see where/when you actually are, what you actually observe, and enact what the best policy says you do in the current situation. You don't constrain yourself on day one, but still enact the global policy on day two.
I think coordination problems are a lot like that. They reward you for adopting preferences genuinely at odds with those you may have later on.
Adopting preferences is a lot like enacting a policy, but when enacting a policy you don't need to adopt preferences, a policy is something external, an algorithmic action (instead of choosing Cooperate, you choose to follow some algorithm that decides what to do, even if that algorithm gets no further input). Contracts in the usual sense act like that, assurance contracts is an example where you are explicitly establishing coordination. You can judge an algorithmic action like you judge an explicit action, but there are more algorithmic actions than there are explicit actions, and algorithmic actions taken by you and your opponents can themselves reason about each other, which enables coordination.
AI currently lacks some crucial faculties, most obviously continual learning and higher sample efficiency (possibly merely as a measure of how well continual learning works). And these things plausibly fall under the umbrella of the more schleppy kinds of automated AI R&D, so that if the AIs learn the narrow skills such as setting up appropriate RL environments (capturing lessons/puzzles from personal experiences of AI instances) and debugging of training issues, that would effectively create these crucial faculties without actually needing to make deeper algorithmic progress. Like human computers in 17th century, these AIs might end up doing manually what a better algorithm could do at a much lower level, much more efficiently. But it would still be much more effective than when it doesn't happen at all, and AI labor scales well.
to rationally pre-commit to acting irrationally in the future
Like conflation around "belief", it's better to have a particular meaning in mind when calling something "rational", such as methods that help more with finding truth, or with making effective plans.
(If there's something you should precommit to, it's not centraly "irrational" to do that thing. Or if it is indeed centrally "irrational" to do it, maybe you shouldn't precommit to it. In this case, it's only "irrational" according to a myopic problem statement that is itself not the right thing to follow. And in the above narrow sense of "rationality" as preference towards better methods rather than merely correctness of individual beliefs and decisions according to given methods, none of these things are either "rational" or "irrational".)
Leaving aside tractability ... neglectedness ... and goodness ... I wanted to argue for importance.
Neglectedness is an unusually legible metric and can massively increase marginal impact. So acute awareness of neglectedness when considering allocation of effort should solve most issues of failing to address every possible point of intervention. Assessment of tractability/goodness/importance gives different puzzles for every hypothetical intervention, and studying these puzzles can itself be a project. Neglectedness is more straightforward, it's a lower-hanging strategic fruit, a reason not to skip assessing tractability/goodness/importance for things nobody is working on, to not dismiss them out of hand for that reason alone.
That things other than chips need to be redesigned wouldn't argue either way, because in that hypothetical everything could just come together at once, the other things the same way as the chips themselves. The issue is capacity of factories and labor for all the stuff and integration and construction. You can't produce everything all at once, instead you need to produce each kind of thing that goes into the finished datacenters over the course of at least months, maybe as long as 2 years for sufficiently similar variants of a system that can share many steps of the process (as with H100/H200/B200 previously, and now GB200/GB300 NVL72).
How elaborate the production process needs to be also doesn't matter, it just shifts the arrival of the finished systems in time (even if substantially), with the first systems still getting ready earlier than the bulk of them. And so the first 20% of everything (at a given stage of production) will be ready partway into the volume production period (in a broad sense that also includes construction of datacenter buildings or burn-in of racks), significantly earlier than most of it.
Once a model is trained, it needs to be served. If both xAI and OpenAI have their ~6T total param models already trained in Jan 2026, xAI will have enough NVL72 systems to serve the model to all its users at a reasonable speed and price (and so they will), while OpenAI won't have that option at all (without restricting demand somehow, probably with rate limits or higher prices).
A meaningful fraction of the buildout of a given system is usually online several months to a year before the bulk of it, that's when the first news about cloud access appear. If new inference hardware makes more total params practical than previous hardware, and scaling of hardware amount (between hardware generations) is still underway, then even a fraction of the new hardware buildout will be comparable to the bulk of the old hardware (which is busy anyway) in FLOPs and training steps that RL can get out of it, adjusting for better efficiency. And slow rollouts for RL (on old hardware) increase batch sizes and decrease the total number of training steps that fit into a few months, this could also be important.
serving to users might not be that important of a dimension
What new hardware becomes available before the bulk of it is not uniquely useful for serving to users, because there isn't enough of it to serve a flagship model with more total params to all users. So it does seem to make sense to use it for RL training of a new model that will use the bulk of the same hardware for serving the model to users once enough of it is built.
I don't think there is a delay specific to NVL72, it just takes this long normally, and with all the external customers Nvidia needs to announce things a bit earlier than, say, Google. This is why I expect Rubin Ultra NVL576 (the next check on TPU dominance after 2026's NVL72) to also take similarly long. It's announced for 2027, but 2028 will probably only see completion of a fraction of the eventual buildout, and only in 2029 will the bulk of the buildout be completed (though maybe late 2028 will be made possible for NVL576 specifically, given the urgency and time to prepare). This would enable companies like OpenAI (without access to TPUs at gigawatt scale) to serve flagship models at the next level of scale (what 2026 pretraining compute asks for) for all its users, catching up to where Google and Anthropic were in 2026-2027 thanks to Ironwood. Unless Google decides to give yet another of its competitors this crucial resource and allows OpenAI to build gigawatts of TPUs earlier than 2028-2029.
Pretraining (GPT-4.5, Grok 4, but also counterfactual large runs which weren’t done) disappointed people this year. It’s probably not because it wouldn’t work; it was just ~30 times more efficient to do post-training instead, on the margin. This should change, yet again, soon, if RL scales even worse.
Model sizes are currently constrained by availability of inference hardware, with multiple trillions of total params having become practical only in late 2025, and only for GDM and Anthropic (OpenAI will need to wait for sufficient GB200/GB300 NVL72 buildout until later in 2026). Using more total params makes even output tokens only slightly more expensive if the inference system has enough HBM per scale-up world, but MoE models get smarter if you allow more total params. At 100K H100s of pretraining compute (2024 training systems), about 1T active params is compute optimal [[1]] , and at 600K Ironwood TPUs of pretraining compute (2026 systems), that's 4T active params. With even 1:8 sparsity, models of 2025 should naturally try to get to 8T total params, and models of 2027 to 30T params, if inference hardware would allow that.
Without inference systems with sufficient HBM per scale-up world, models can't be efficiently trained with RL either, thus lack of availability of such hardware also results in large models not getting trained with RL. And since 2025 is the first year RLVR was seriously applied to production LLMs, the process started with the smaller LLMs that allow faster iteration and got through the orders of magnitude quickly.
GPT-4.5, Grok 4
GPT-4.5 was probably a compute optimal pretrain, so plausibly a ~1T active params, ~8T total params model [[2]] , targeting NVL72 systems for inference and RL training that were not yet available when it was released (in this preliminary form). So it couldn't be seriously trained with RL, and could only be served on older Nvidia 8-chip servers, slowly and expensively. A variant of it with a lot of RL training will likely soon get released to answer the challenge of Gemini 3 Pro and Opus 4.5 (either based on that exact pretrain, or after another run adjusted with lessons learned from the first one, if the first attempt that became GPT-4.5 was botched in some way, as the rumor has it). Though there's still not enough NVL72s to serve it as a flagship model, demand would need to be constrained by prices or rate limits for now.
Grok 4 was the RLVR run, probably over Grok 3's pretrain, and has 3T total params, likely with fewer active params than would be compute optimal for pretraining on 100K H100s. But the number of total params is still significant, so since xAI didn't yet have NVL72 systems (for long enough), its RL training wasn't very efficient.
This should change, yet again, soon
High end late 2025 inference hardware (Trillium TPUs, Trainium 2 Ultra) is almost sufficient for models that 2024 compute enables to pretrain, and plausibly Gemini 3 Pro and Opus 4.5 already cleared this bar, with RL training applied efficiently (using hardware with sufficient HBM per scale-up world) at pretraining scale. Soon GB200/GB300 NVL72 will be more than sufficient for such models, when there's enough of them built in 2026. But the next step requires Ironwood, even Rubin NVL72 systems will constrain models pretrained with 2026 compute (that want at least ~30T total params). So unless Google starts building even more giant TPU datacenters for its competitors (which it surprisingly did for Anthropic), there will be another period of difficulty with practicality of scaling pretraining, until Nvidia's Rubin Ultra NVL576 are built in sufficient numbers sometime in late 2028 to 2029.
Assuming 120 tokens/param compute optimal for a MoE model at 1:8 sparsity, 4 months of training at 40% utilization in FP8 (which currently seems plausibly mainstream, even NVFP4 no longer seems completely impossible in pretraining). ↩︎
Since Grok 5 will be a 6T total param model, intended to compete with OpenAI and targeting the same NVL72 system, maybe GPT-4.5 is just 6T total params as well, since if GPT-4.5 was larger, xAI might've been able to find that out and match its shape when planning Grok 5. ↩︎
When you go through a textbook, there are confusions you can notice but not yet immediately resolve, and these could plausibly become RLVR tasks. To choose and formulate some puzzle as an RLVR task, the AI would need to already understand the context of that puzzle, but then training on that task makes it ready to understand more. Setting priorities for learning seems like a general skill that adapts to various situations as you learn to understand them better. As with human learning, the ordering from more familiar lessons to deeper expertise would happen naturally for AI instances as they engage in active learning about their situations.
So my point is that automating just this thing might be sufficient, and the perception of its schleppiness is exactly the claim of its generalizability. You need expertise sufficient to choose and formulate the puzzles, not yet sufficient to solve them, and this generation-verification gap keeps moving the frontier of understanding forward, step by step, but potentially indefinitely.