"We'll be building a cluster of around 22,000 H100s. This is approximately three times more compute than what was used to train all of GPT4.
This bothers me. It's a naive way of seeing compute. It's like confusing Watts and Watt-hours
22,000 H100s is three times the amount of FLOP/s than what was used to GPT-4, so you could train it in 3x less time, of with 1/3 of your cluster and the same time.
I think this view of looking at compute helps making naive asumptions about what this compute can be used to. And FLOP/s are not a perfect unit for normal discourse when we're at x10¹⁵ scales.
This gap will only widen over time; China is failing to develop a domestic semiconductor industry, despite massive efforts to do so, and is increasingly cut off from international semiconductor supply chains.
I would say this is a falsehood
The US export ban on Controlled GPUs has really made china push for local semiconductor manufacturing way, and accelerate their projects, they dont have 5nm TSMC quality wafers, fine, but they're developing the full stack.
I mean if this was a "The AGi Race Between the US and Russia doesnt exist" okay fine, but Seeing how more than half the papers that land in ArXiv have chinese authors in them, plus the whole China does 90% of electronic manufacturing in the world. I dont understand how you come to the conclusion that china is hopelessly dead in the water.
The day the US export ban on GPU happened, okay, most of us really wondered, but seeing how they're operating 6 months to a year afterwards, it just obvious that they will be able to make it happen.
I dont think any of that invalidates that Gwern is a usual, usually right.