This reminds me of this LessWrong post.
If It’s Worth Doing, It’s Worth Doing With Made-Up Statistics
I think even with humans, IQ isn't the best measure to quantify what we call intelligence. The way I tend to think of it is that high general intelligence correlates with higher IQ test scores, but just optimizing performance on IQ tests doesn't necessarily mean that you become more intelligent in general outside of that task.
But I'm okay with the idea of using IQ scores in the context of this post because it seems useful to capture the change in capabilities of these models.
What about the reports that GPT-5 performance isn't as strong as expected on many tasks due to lack of enough high-quality pretraining data? Isn't that a blocker for scaling to 5e28 FLOPs by 2028?
Though my understanding was that if we were hypothetically able to generate enough training data, these models would continue scaling according to the scaling laws. Are you making the argument that the synthetic data generated by these long reasoning models will allow us to continue scaling these models?