with 10-30x AIs, solving alignment takes like 1-3 years of work ... so a crucial factor is US government buy-in for nonproliferation
Those AIs might be able to lobby for nonproliferation or do things like write a better IABIED, making coordination interventions that oppose myopic racing. Directing AIs to pursue such projects could be a priority comparable to direct alignment work. Unclear how visibly asymmetric such interventions will prove to be, but then alignment vs. capabilities work might be in a similar situation.
There doesn't necessarily need to be algorithmic progress to get there, sufficient bandwidth enables traditional pretraining across multiple sites. But it might be difficult to ensure it's available across the geographically distributed sites on short notice, if you aren't already a well-established hyperscaler building near your older datacenter sites.
In 2028, targeting inference on Rubin Ultra NVL576 (150 TB of HBM in a scale-up world) might want a MoE model with 80 TB of total params (80T params if in FP8, 160T in FP4). If training uses the same precision for gradients, that's also 80 TB of gradients to exchange. If averaged gradients use more precision, this could be 2x-8x more data.
If training is done using 2 GW of some kind of Rubin GPUs, that's about 2e22-3e22 FP4 FLOP/s, and at 30% utilization for 4 months it produces 8e28 FP4 FLOPs. At 120 tokens/param (anchoring to 40 tokens/param for the dense Llama 3 405B and adjusting 3x for 1:8 sparsity), this system might want about 10T active params (so we get 1:16 sparsity, with 160T total FP4 params, or about 1:8 for FP8). This needs 1,200T tokens, maybe 250T unique, which is a problem, but not yet orders of magnitude beyond the pale, so probably something can still be done without needing bigger models.
With large scale-up worlds, processing sequences of 32K tokens with non-CPX Rubin NVL144 at 30% utilization would take just 2.7 seconds (for pretraining). A 2 GW system has 9K racks, so that's a batch of 300M tokens, which is already a lot (Llama 3 405B used 16M token batches in the main phase of pretraining), so that should be the target characteristic time for exchanging gradients.
Moving 80 TB in 2.7 seconds needs 240 Tbps, or 500-2,000 Tbps if averaged gradients use 2x-8x more precision bits (even more if not all-to-all, which is likely with more than 2 sites), and this already loses half of utilization or asks for even larger batches. A DWDM system might transmit 30-70 Tbps over a fiber optic pair, so this is 4-70 fiber optic pairs, which seems in principle feasible to secure for overland fiber cables (which hold hundreds of pairs), especially towards the lower end of the estimate.
Peter Wildeford: I’d expect the 10GW OpenAI cluster becomes operational around 2027-2028.
There are no 10 GW OpenAI clusters, there is a conditional investment by Nvidia for every additional 1 GW in total across all their datacenters, which won't be forming a single training system. Inference and smaller training experiments need a lot of compute, and OpenAI is now building their own even for inference, so most of what they are building is not for frontier model training.
Zvi Mowshowitz: these announcements plus the original cite in Abilene, Texas cover over $400 billion and 7-gigawatts over three years
Depending on OpenAI growth, this is more of a soft upper bound on what gets built. This is evidence that 5 GW training systems likely aren't going to be built by (end of) 2028. So there's going to be a slowdown compared to the trend of 12x in compute (and 6x in power) every 2 years for the largest frontier AI training systems, which held in 2022-2026[1]. In 2027-2028, the largest training systems are likely going to be merely 2 GW instead of the on-trend 5 GW. Though for the 2024 systems, FP8 is likely relevant, and for 2026 systems maybe even FP4, which turns the 12x in compute every 2 years in 2022-2026 into 24x in compute every 2 years (5x per year in pretraining-relevant raw compute for a single training system).
This 24x every 2 years is even less plausible to remain on-trend in 2028, so 2027+ is going to be the time of scaling slowdown, at least for pretraining. Though the AIs trained on 2026 compute might only come out in 2027-2028, judging by how 2024 training compute is still held back by inference capabilities, and some AIs enabled by 2024 levels of compute might only come out in 2026. So the slowdown in the scale of frontier AI training systems after 2026 might only start being observable in scaling of deployed AIs starting in 2028-2029.
Perhaps some of these sites will be connected with sufficient bandwidth, and training with RLVR at multiple sites doesn't need a lot of bandwidth. Actual plans for larger training runs should urge them to build larger individual sites, as this ensures optionality for unusual training processes. So the fact that this isn't happening suggests that there are no such plans for now (except perhaps for RLVR-like things specifically).
24K A100s in 2022 (7e18 BF16 FLOP/s, 22 MW), 100K H100s in 2024 (1e20 BF16 FLOP/s, 150 MW), 400K chips in GB200/GB300 NVL72 racks in 2026 (1e21 BF16 FLOP/s, 900 MW). The power estimates are all-in, for the whole datacenter site. ↩︎
I think the book is saying less that it's obvious that ASI will kill us all but that it is inevitable that ASI will kill us all, and so our only option is to make sure nobody builds it. I do think this is a pretty fair gloss
Crucial caveat that this is conditional on building it soon, rather than preparing to an unprecedented degree first. Probably you are tracking this, but when you say it like that someone without context might take the intended meaning as unconditional inevitable lethality of ASI, which is very different. Our only option is that nobody builds it soon, not that nobody builds it ever, is the claim.
it is trying to argue that future alignment techniques won't be adequate, because the problem is just too hard
This is still future alignment techniques that can become available soon. Reasonable counterarguments to inevitability of ASI-caused extinction or takeover if it's created soon seem to be mostly about AGIs developing meaningfully useful alignment techniques soon enough (and if not soon enough, an ASI Pause of some kind would help, but then AGIs themselves are almost as big of a problem).
Fast takeoff is not takeover is not extinction. For example gradual disempowerment without a fast takeoff leads to takeover, which may lead to either extinction or permanent disempowerment depending on values of AIs.
I think it's quite plausible that AGIs merely de facto take over frontier AI R&D, with enough economic prosperity and human figureheads to ensure humanity's complacency. And when later there's superintelligence, humanity might notice that it's left with a tiny insignificant share in resources of the reachable universe and no prospect at all of ever changing this, even on cosmic timescales.
Hence more well-established cryonics would be important for civilizational incentives, not just personal survival.
Genetic enhancement seems like a safe-ish way of getting a few standard deviations without yet knowing what you are really doing, that current humanity could actually attempt in practice. And that might help a lot with both the "knowing what you are doing" part, and with not doing irreversible things without knowing what you are doing. Any change risks misalignment, uplifting to a superintelligence requires ASI-grade alignment theory and technology, even lifespans for baseline biological humans that run into centuries risk misalignment (since this never happened before). There's always cryonics, which enables waiting for future progress, if civilization was at all serious about it.
So when you talk about "merging with AI", that is very suspicious, because a well-developed uplifting methodology doesn't obviously look anything like "merging with AI". You become some kind of more capable mind, that's different from what you were before, not taking irreversible steps towards something you wouldn't endorse. Without such a methodology, it's a priori about as bad an idea as building superintelligence in 2029.
That's the intended meaning, I go into more detail in the linked post. Hence "the future of humanity" rather than simply "humanity", something humanity would endorse as its future, which is not exclusively (or at all) biological humans. Currently living humans could in principle develop tools to uplift themselves all the way to star-sized superintelligences, but that requires a star, while what humans might instead get is a metaphorical server rack, hence permanent disempowerment.
My comment is primarily an objection about vague terminology not distinguishing permanent disempowerment from extinction. Avoiding permanent disempowerment seems like the correct shared cause, while the cause of merely non-extinction has many ways of endorsing plans that lead to permanent disempowerment. And not being content with permanent disempowerment (even under the conditions of eutopia within strict constraints on resources) depends on noticing that more is possible.
If one thinks the chance of an existential disaster is "anywhere between 10% and 90%", one should definitely worry about the potential of any plan to counter it to backfire.
Is permanent disempowerment (where the future of humanity only gets a tiny sliver of the reachable universe) an "existential disaster"? It's not literal extinction, "existential risk" can mean either, the distinction can be crucial.
I think permanent disempowerment (or extinction) is somewhere between 90% and 95% unconditionally, and north of 95% conditional on building a superintelligence by 2050. But literal extinction is only between 10% and 30% (on current trajectory). The chances improve with interventions such as a lasting ASI Pause, including an AGI-led ASI Pause, which makes it more likely that ASIs are at least aligned with the AGIs. A lasting AGI Pause (rather than an ASI Pause) is the only straightforward and predictably effective way to avoid permanent disempowerment, and a sane civilization would just do that, with some margin of even weaker AIs and even worse hardware.
Dodging permanent disempowerment (rather than merely extinction) without an AGI Pause likely needs AGIs that somehow both haven't taken over and simultaneously effective enough at helping with the ASI Pause effort. This could just take the form of allocating 80% of AGI labor or whatever to ASI alignment projects, so that capabilities never outpace the ability to either contain or avoid misalignment. So not necessarily a literal Pause, when there are AGIs around that can set up lasting institutions with inhuman levels of robustness capable of implementing commitments to pursue ASI alignment that are less blunt than a literal Pause yet still effective. But for the same reasons that this kind of thing might work, it seems unlikely to work without an AGI takeover.
My point is that the 10-30x AIs might be able to be more effective at coordination around AI risk than humans alone, in particular more effective than currently seems feasible in the relevant timeframe (when not taking into account the use of those 10-30x AIs). Saying "labs" doesn't make this distinction explicit.