All of _liminaldrift's Comments + Replies

What about the reports that GPT-5 performance isn't as strong as expected on many tasks due to lack of enough high-quality pretraining data? Isn't that a blocker for scaling to 5e28 FLOPs by 2028?

Though my understanding was that if we were hypothetically able to generate enough training data, these models would continue scaling according to the scaling laws. Are you making the argument that the synthetic data generated by these long reasoning models will allow us to continue scaling these models?

7Vladimir_Nesov
There is enough natural text data until 2026-2028, as I describe in the Peak Data section of the linked post. It's not very good data, but with 2,500x raw compute of original GPT-4 (and possibly 10,000x-25,000x in effective compute due to algorithmic improvement in pretraining), that's a lot of headroom that doesn't depend on inventing new things (such as synthetic data suitable for improving general intelligence through pretraining the way natural text data is). Insufficient data could in principle be an issue with making good use of 5e28 FLOPs, but actually getting 5e28 FLOPs by 2028 (from a single training system) only requires funding. The decisions about this don't need to be taken based on AIs that exist today, they'll be taken based on AIs that exist in 2026-2027, trained on 1 GW training systems being built this year. With o3-like post-training, the utility and impressiveness of an LLM improves, so the chances of getting that project funded improve (compared to absence of such techniques).
1garrison
I think that is a problem for the industry, but probably not an insurmountable barrier the way some commentators make it out to be.  1. o-series of models may be able to produce new high quality training data 2. sufficiently good reasoning approaches + existing base models + scaffolding may be sufficient to get you to automating ML research One other thought is that there's probably an upper limit on how good an LLM can get even with unlimited high quality data and I'd guess that models would asymptotically approach it for a while. Based on the reporting around GPT-5 and other next-gen models, I'd guess that the issue is lack of data rather than approaching some fundamental limit. 

This reminds me of this LessWrong post.

If It’s Worth Doing, It’s Worth Doing With Made-Up Statistics

https://www.lesswrong.com/posts/9Tw5RqnEzqEtaoEkq/if-it-s-worth-doing-it-s-worth-doing-with-made-up-statistics

I think even with humans, IQ isn't the best measure to quantify what we call intelligence. The way I tend to think of it is that high general intelligence correlates with higher IQ test scores, but just optimizing performance on IQ tests doesn't necessarily mean that you become more intelligent in general outside of that task.

But I'm okay with the idea of using IQ scores in the context of this post because it seems useful to capture the change in capabilities of these models.