I see some discussion here and in the associated Reddit thread about more efficient and smaller models. I think ChatGPT4 is at about one trillion parameters. I was under the impression that model sizes were increasing at about 10x/year so that could mean GPT5 is 10 trillion and GPT6 (or equivalent) is 100 trillion parameters by 2026. Does that sound about right or is there some sort of algorithmic change likely to happen that will allow LLMs to improve without the number of parameters growing 10x/year?
On a related note, I've heard backend cluster sizes are... (read more)
10x per year for compute seems high to me. Naïvely I would expect the price/performance of compute to double every 1-2 years as it has been forever, with overall compute available for training big models being a function of that + increasing investment in the space, which could look more like one-time jumps. (I.e. a 10x jump in compute in 2024 may happen because of increased investment, but a 100x increase by 2025 seems unlikely.) But I am somewhat uncertain of this.
For parameters, I definitely think the largest models will keep getting bigger, and for compute to be the big driver of that -- but also I would expect improvements like mixture of experts models to continue, which effectively allow more parameters with less compute (because not all of the parameters are used at all times). Other techniques, like RLHF, also improve the subjective performance of models without increasing their size (i.e. getting them to do useful things rather than only predict what next word is most likely).
I guess my prediction here would be simply that things like this continue, so that in 2025 if you have X compute, you could get a better model in 2025 than you could in 2023. But you also could have 5x to 50x more compute in 2025, so you have the sum of those improvements!
It's obviously far cheaper to play with smaller models, so I expect lots of improvements will initially appear in models small-for-their-time.
Just my thoughts!
I see some discussion here and in the associated Reddit thread about more efficient and smaller models. I think ChatGPT4 is at about one trillion parameters. I was under the impression that model sizes were increasing at about 10x/year so that could mean GPT5 is 10 trillion and GPT6 (or equivalent) is 100 trillion parameters by 2026. Does that sound about right or is there some sort of algorithmic change likely to happen that will allow LLMs to improve without the number of parameters growing 10x/year?
On a related note, I've heard backend cluster sizes are... (read more)