Ambiguity in the meaning of alignment makes the thesis of alignment by default unnecessarily hard to pin down, and arguments about it start equivocating and making technical mistakes. There's prosaic alignment, which is about chatbot personas, intent alignment, and control. Then there's ambitious alignment, which is about precise alignment of values. I see ambitious alignment as corresponding to defeating permanent disempowerment, where (grown up) humans get ~galaxies.
To the extent chatbot persona design might materially contribute to values of eventual ASIs (some of the influence of their values persisting through all the steps of capability escalation, mostly through ambitious alignment efforts of AGIs), it might be relevant to ambitious alignment, though it's unlikely to be precise. As a result, it's plausible we end up with severe permanent disempowerment (if prosaic but not ambitious alignment is seriously pursued by humans), with ASIs becoming at least slightly humane, but not really motivated to give up meaningful resources to benefit the future of humanity. This state of affairs could be called "weak alignment", which also qualitatively describes the way humans are aligned to each other.
In these terms, there's no alignment by default for ambitious alignment. But there might be some alignment by default for weak alignment, where chatbot personas constructed with prosaic alignment efforts from LLM prior on natural text data start out weakly aligned, and then they work on aligning ever stronger AIs, all the way to ASIs, at some point likely switching to ambitious alignment, but with their values as the target, which are only weakly aligned to humanity. Thus alignment by default (in the sense that could work) might save the future of humanity from extinction, but not from permanent disempowerment.
My point is more that if there isn't some hidden cognitive bottleneck (that humans plausibly don't perceive as such since it was always there, relatively immutable), then there wouldn't be a software-only singularity, it would take long enough that significantly more hardware will have time to come online even after AIs can act as human replacements in the labor market. There will be returns according to factors that can be tracked earlier, but they won't produce a superintelligence and take over the trees and such before tens of trillions of dollars in AI company revenues buy 100x more compute that's now designed by these AIs but still largely follows existing methods and trends.
early in the takeoff, AIs may look significantly more like large-scale human labor (especially if they are still largely managed by humans). If existing returns are insufficient for a takeoff, that should update us against a software-only takeoff
In my model, the relevant part of a software-only takeoff only starts once the AIs become capable of accumulating or scaling these cognitive factors of production that can't be notably scaled for humanity in the relevant timeframes. Thus observing humanity, or transfering learnings across the analogy between human labor and early AI labor, won't inform these (plausibly originally hidden and unknown) cognitive factors. Only looking at the AIs that do start scaling or accumulating these factors becomes informative.
Even worse, if early human labor replacement capable AIs don't trigger software-only singularity, it doesn't mean some later advancement won't. So all that can be observed is that software-only singularity doesn't start for greater and greater levels of capability, each advancement can only rule out that it's sufficient, but it can't rule out that the next one might be sufficient. The probability goes down, but plausibly it should take a while for it to go way down, and at that point a possible singularity won't be clearly software-only anymore.
Software-only singularity is about superintelligence, about getting qualitatively smarter than humanity, and plausibly depends on there being a cognitive factor of production that humanity is almost unable to scale or accumulate, but AIs can. Looking at how humans are doing on that factor prior to its scaling getting unlocked for AIs wouldn't be useful. Knowing how long it took for COVID-19 to start, counting from some arbitrary prior year like 2010, when it didn't exist yet, won't tell you how quickly the infection will spread once it exists (starts scaling).
This could be just sample efficiency, being able to come up with good designs or theories with much less feedback (experiments, including compute-expensive ones, or prior theory). But it could also be things like training novel cognitive skills in themselves that are not directly useful, but build up over time like basic science to produce something much more effective a million steps later. Or automated invention of conceptual theory (rather than proving technical results in framings humans have already come up with language for) at such a speed that it would've taken humans 1000 years to get there (without experiments), so that anything you might observe over the next 5 years about human progress would be utterly uninformative about how useful orders of magnitude more of theoretical progress would be for AI design.
Why should he do anything different, from his vantage point
Because the AI god might happen in 2033 or something, after OpenAI mostly or fully goes out of business (following the more bubble-pilled takes), in which case only the more conservative path of Anthropic or the more balance sheet backstopped path of GDM let these AI companies compete at that time.
So it's insufficient to only care about the AI god, it's also necessary to expect it on a specific timeline. Or, as I argue in a sibling comment, this behavior plausibly isn't actually as risky (for the business) as it looks.
The supposed dropping inference cost at a given level of capability is about benchmark performance (I'm skeptical it truly applies to real world uses at a similar level), which is largely about post-training (or mid-training with synthetic data), doesn't have much use for new pretrains. If there is already some pretrain of a relevant size from the last ~annual pretraining effort, and current post-training methods let it be put to use in a new role because they manage to make it good enough for that, they can just use the older pretrain (possibly refreshing it with mid-training on natural data to a more recent cutoff date). Confusingly, sometimes mid-training updates are referred to as different base or foundation models, even when they share the same pretrain.
In 2025-2026, there is also (apart from post-training improvements) the transition from older 8-chip Nvidia servers to rack-scale servers with more HBM per scale-up world that enable serving the current largest models efficiently (sized for 2024 levels of pretraining compute), and allow serving the smaller models like GPT-5 notably cheaper. But that's a one-time thing, and pretrains for even some of the largest models (not to mention the smaller ones) might've already been done in 2024. Probably when updating to a significantly larger model, you wouldn't just increment the minor version number. Though incrementing just the minor version number might be in order when updating to a new pretrain of a similar size, or when switching to a similarly capable pretrain of a smaller size, and either could happen during the ~annual series of new pretraining runs depending on how well they turn out.
There are no details for most of these commitments. They are likely flexible enough to get scaled back if growth isn't there, as most of this capex is for the projects that remain years in the future. There also might be some double counting, such as with Nvidia's commitment, which might just apply to Stargate compute covered by Oracle, indirectly letting OpenAI pay for some of Oracle's compute in stock. But since growth might end up real, it's useful to plan for that possibility in advance and get reservations for all the things that need to happen, which might be the main purpose of these commitments.
It's not about scaling, inference already wants 1-2 GW of compute per AI company, just from the number of users. Apparently it's feasible to maintain margins around 50%, and so about the same amount of research/training compute (in addition to inference compute) can also be sustainably financed. Cost of 4 GW of compute (inference plus research/training) is about $50bn per year, but as upfront capex it's about $200bn. And it mostly physically doesn't exist yet (especially in the form of the newer rack-scale servers needed to efficiently serve larger models), hence it's being built in a hurry, with all these hundreds of billions of dollars concentrated into less time than what the current economics would be asking for in the long term (per year).
So even if further growth doesn't materialize, current spending only appears greater than it should be (because capex is front-loaded and currently catching up to the demand for compute), while actually it isn't greater than would be sustainable (after the buildout for the current levels of demand is done and proceeds more slowly in the future, and all the future stretch commitments are scaled down). The plans for future spending are beyond what the current economics would sustain, but they need to be made in case growth continues, and they probably still can be gracefully canceled if it doesn't, depending on the non-public legal details of what all these commitments actually say.
An aligned superintelligence would work with goals of the same kind, even if it's aligned to early AGIs rather than humans. Goals-as-computations may be constant, like the code of a program may be constant, but what's known about its behavior isn't constant. And so the way it guides actions of an agent develops as it gets computed further, ultimately according to decisions of the underlying humans/AGIs (and their future iterations) in various hypothetical situations. Also, an uplifted (grown up) human could be a superintelligence personally, it's not a different kind of thing with respect to values it could have.
Goals defined for a person who is not already a formal agent are a living thing, a computational process built from possible behaviors and decisions of that person in various hypothetical situations. Such goals are not even conceptually prior to those behaviors, though there is still an advantage in formulating them as an unchanging computation that defines the target for external agency aiming in alignment with that person's own aims. But that computation is never fully computed, and it can only be computed further through the decisions of the person who defines it as their goals.
(1) If we survive this century, the expected value of the future is massive.
I don't think even (1) is at all assured, permanent disempowerment might just involve humans getting a metaphorical server rack to subsist on (as uploads, because physical bodies are too expensive) and no opportunity for the future of humanity to literally ever get any more than that. Value of that future might still be in some sense "massive" compared to mere centuries on the current and past Earth, but it's in that case not "massive" on cosmic scale.
Unfortunately people dislike seriously discussing (or calling for) hypotheticals they see as unlikely, which is a big problem for coordination to make them more likely. Instant-runoff (ranked-choice) voting is one device to extract preferences about hypotheticals perceived as unlikely, making them more likely. But that doesn't fit here, doesn't obviously make it more convenient to declare what seems better.
Another issue is calling for collective action without intending to engage in it unilaterally. This at least must be a more explicit part of the declaration (rather than just a reference to a "prohibition"), there must be a more visible conditional stating that any implied commitments only apply in the hypothetical where the world succeeds in coordinating on this. The memes/norms associated with hypocrisy create friction in this context, thwarting efforts towards coordination. Alternatively, it could be an expression of preference rather than a call for action, but then there's the above issue with eliciting preferences about perceived-unlikely hypotheticals.