AI takeoff story: a continuation of progress by other means
Thanks to Vladimir Mikulik for suggesting that I write this, and to Rohin Shah and Daniel Kokotajlo for kindly providing feedback. Prologue This is a story about a universe a lot like ours. In this universe, the scaling hypothesis — which very roughly says that you can make an AI smarter just by making it bigger — turns out to be completely right. It’s gradually realized that advances in AI don’t arise from conceptual breakthroughs or sophisticated deep learning architectures. Just the opposite: the simpler the architecture, the better it turns out to perform at scale. Past a certain point, clever model-building was just slowing down progress. Researchers in this universe discover a rough rule of thumb: each neural network architecture has an intrinsic maximum potential intelligence, or “capability”. When you train a network on a problem, how close it gets to reaching its potential capability depends on two limiting factors: 1) the size and diversity of its dataset; and 2) the amount of compute that’s used to train it. Training a network on a quadrillion games of tic-tac-toe won’t make it smart, but training a network on a quadrillion-word corpus of text might just do it. Even data cleaning and quality control don’t matter too much: as long as you have scale, if you train your system long enough, it learns to separate signal from noise automatically. Generally, the more parameters a neural network has, the higher its potential capability. Neural nets with simple architectures also have a higher potential capability than nets with more sophisticated architectures do. This last observation takes the research community longer to absorb than you might expect — it’s a bitter lesson, after all — so the groups that internalize it first have an early edge. Frontier AI projects begin to deemphasize architecture innovations and any but the most basic data preprocessing. They focus instead on simple models, huge datasets, hard problems, and abundant compute. Initial prog

Credit where credit is due, incidentally: the biggest single inflection point for this phenomenon was clearly Situational Awareness. Almost zero reporting in the mainstream news; yet by the end of 2024, everyone in the relevant spaces had read & absorbed it.