As far as I can tell, Sam is saying no to size. That does not mean saying no to compute, data, or scaling.
"Hundreds of complicated things" comment definitely can't be interpreted to be against transformers, since "simply" scaling transformers fits the description perfectly. "Simply" scaling transformers involves things like writing a new compiler. It is simple in strategy, not in execution.
Altman said there are also physical limits to how many data centers the company can build and how quickly it can build them.
This seems to insinuate a cool down in scaling compute and Sam previously acknowledged that the data bottleneck was a real roadblock.
The poll appears to be asking two, opposite questions. I'm not clear on whether a 99% means it will be a transformer or whether it means something else is needed to get there?
Maybe Sam knows a lot I don't know but here are some reasons why I'm skeptical about the end of scaling large language models:
Because scaling laws are power laws (x-axis is logarithmic and y-axis is linear), there are diminishing returns to resources like more compute but I doubt we've reached the point where the marginal cost of training larger models exceeds the marginal benefit. Think of a company like Google: building the biggest and best model is immensely valuable in a global, winner-takes-all market like search.
I'm getting the same conclusions.
Think of a company like Google: building the biggest and best model is immensely valuable in a global, winner-takes-all market like search.
And this is in a world, where Google already announced that they're going to build even bigger model of their own
We are not, and won't for some* time.
I was chatting with a friend of mine who works in the AI space. He said that the big thing that got them to GPT-4 was the data set; which was basically the entire internet. But now that they've given it the entire internet, there's no easy way for them to go further along that axis;; that the next big increase in capabilities would require a significantly different direction than "more text / more parameters / more compute".
I'd have to disagree with this assessment. Ilya Sutskever recently said that they've not run out of data yet. They might some day, but not yet. And Epoch projects high-quality text data to run out in 2024, with all text data running out in 2040.
Maybe temporarily efficiency improvements will rule, but surely once the low and medium hanging fruit is exhausted parameter count will once again be ramped up, would bet just about anything on that
And of course if we believe efficiency is the way to go for the next few years, that should scare the shit out of us, it means that even putting all gpu manufacturers out of commission might not be enough to save us should it become obvious that a slowdown is needed
shutting down GPU production was never in the Overton window anyway. This makes little difference. Even if further scaling isnt needed most people cant afford the 100M spent on gpt4.
There wasn't much content to this article, except for the below quotes.
Sam Altman says
Nick Frosst, a cofounder at Cohere, says
In the Lex Fridman interview with Sam Altman, Sam said that they had to do "hundreds of complicated things". Does this, together with the above quote, suggest Sam thinks transformers are running out of oomph? Is he, perhaps, pausing progress whilst we await the next breakthrough in deep learning?
Edit: Added in a relevant question. Give your probability for the first option.
Or vote at this tweet, if you like:
https://twitter.com/Algon_33/status/1648047440065449993