Flagship models need inference compute at gigawatt scale with a lot of HBM per scale-up world. Nvidia's systems are currently a year behind for serving models with trillions of total params, and will remain behind until 2028-2029 for serving models with tens of trillions of total params. Thus if OpenAI fails to access TPUs or some other alternative to Nvidia (at gigawatt scale), it will continue being unable to serve a model with a competitive amount of total params as a flagship model until late 2028 to 2029. There will be a window in 2026 when OpenAI catches up, but then it's behind again.
The current largest flagship models are Gemini 3 Pro and Opus 4.5, probably at multiple trillions of total params, requiring systems with multiple TB of HBM per scale-up world to serve efficiently. They are likely using Trillium (TPUv6e, 8 TB per scale-up world) and Trainium 2 Ultra (6 TB per scale-up world), and need north of high hundreds of megawatts of such systems to serve their user bases.
Nvidia's system in this class is GB200/GB300 NVL72 (14/20 TB per scale-up world), but so far there isn't enough of it built, and so models served with Nvidia's older hardware (H100/H200/B200, 0.6-1.4 TB per 8-chip scale-up world) either have to remain smaller or become more expensive. The smaller amount of NVL72s that are currently in operation can only serve large models to a smaller user base. As a result, OpenAI will probably have to keep the smaller GPT-5 as the flagship model until they and Azure build enough NVL72s, which will happen somewhere in mid to late 2026 (the bigger model will very likely get released much earlier than that, perhaps even imminently, but will have to remain heavily restricted by either price or rate limits). Paradoxically, xAI might be in a better position as a result of having fewer users, and so they might be able to serve their 6T total param Grok 5 starting early 2026 at a reasonable price.
But then in 2026, there is a gigawatt scale buildout of Iron