I think he's totally right that there's a missing ability in LLMs. He doesn't claim this will be a big blocker. I think we'd be fools to assume this gives us much more time.
My previous comment on this podcast and pretty much this question says more.
Briefly: There might be a variety of fairly easy ways to add more System 2 thinking and problem-solving and reasoning in genuinely new domains. Here's one for system 2 thinking, and here's one for better reasoning and knowledge discover. There might easily be six more that people are busy implementing, half of which will work pretty quickly.
This could be a bottleneck that gives us extra years, but assuming that seems like a very bad idea. We should step lively on the whole alignment project in case this intelligence stuff happens to be a lot easier than we've been thinking prior to getting enough compute and deep nets that really work.
WRT the consensus you mention: there's no consensus, here or elsewhere. Nobody knows. Taking an average would be a bad idea. The distribution among people who've got the right expertise (or as close as we get now) and spend time on prediction is still very broad. This says pretty clearly that nobody knows. That includes this question as well as all other timeline questions. We can't be sure until it's built and working. I've got lots of reasoning behind my guess that this won't take long to solve, but I wouldn't place heavy odds on being right.
That broad distribution is why the smart bet is to have an alignment solution ready for the shorter projected timelines.
He doesn't claim this will be a big blocker.
He does, otherwise the claim that OpenAI pushed back AGI timelines by 5-10 years doesn’t make sense.
There might be a variety of fairly easy ways to add more System 2 thinking and problem-solving and reasoning in genuinely new domains. Here's one for system 2 thinking
This approach seemed more plausible to me a year ago than it does now. It seemed feasible enough that I sketched out takeover scenarios along those lines (eg here). But I think we should update on the fact that there doesn't seem to have been much progress in this direction since then, despite eg Auto-GPT getting $12M in funding in November, and lots of other startups racing for commerc...
CoT prompting and agentic behavior are basically supplying System 2 thinking. Currently LLMs tend to use and benefit from them for a little while, then sooner or later go off the rails/get caught in a loop/get confused, and are seldom able to get unstuck when they do. What we need is for them to be able to much more reliably carry out abilities that they have already demonstrated: which is bread-and-butter for scaling. So I don't see System 2 thinking as a blocker, just work-in-progress. It might take a few years.
As for the ARC challenge, it clearly requires a visual LLM, so systems capable of attempting it have only really existed for about 18 months. My guess is that it will fall soon: progress on math and programming benchmarks has been rapid, so visual logic puzzles doesn't seem like it would be that hard. I'd guess the main problem is the shortage of visual puzzle training material for tasks like this in most training sets.
My guess is that it will fall soon: progress on math and programming benchmarks has been rapid, so visual logic puzzles doesn't seem like it would be that hard.
His argument is that with millions of examples of these puzzles, you can train an LLM to be good at this particular task, but that doesn’t mean reasoning if it fails at a similar task it doesn’t see. He thinks you should be able to train an LLM to do this without ever training on tasks like these.
I can buy this argument, but still have some doubts. It may be this reasoning is just derived from vi...
Francois seems almost to assume that just because an algorithm takes millions or billions of datapoints to train, that means its output is just "memorization". In fact it seems to me that the learning algorithms just work pretty slowly, and that the thing that's learned after those millions or billions of tries is the actual generative concepts.
My hypothesis is that poor performance on ARC is largely due to lack of training data. If there were billions of diverse input/output examples to train on, I would guess standard techniques would work.
Efficiently learning from just a few examples is something that humans are still relatively good at, especially in simple cases where system1and system 2 synergize well. I’m not aware of many cases where AI approaches human level without orders of magnitude more training data than a human ever sees in a lifetime.
I think the ARC challenge can be solved within a year or two, but doing so won’t be super interesting to me unless it breaks new ground in sample efficiency (not trained on billions of synthetic examples) or generalization (e.g. solved using existing LLMs rather than a specialized net).
Dwarkesh had a podcast recently with Francois Chollet (creator of Keras)
He seems fairly skeptical we are anywhere near AGI with LLMs. He mostly bases his intuition that LLMs fail on OOD tasks and don't seem to be good at solving simple abstract reasoning problems he calls the ARC challenge. It seems he thinks system 2 thinking will be a much harder unlock than people think and that scaling LLMs will go nowhere. In fact he goes so far as to say the scaling maximalists have set back AGI progress by 5-10 years. Current LLMs to him are just simply information retrieval databases.
He, along with the CEO of Zapier, have launched a 1 million dollar prize to beating the ARC bench marks, which are apparently hard for LLMs. I didn't believe it at first, given how easy they seem, but barely any progress has been made on the ARC bench marks in the last 4 years. In retrospect, it's odd that so many existing benchmarks rely heavily on memorized knowledge, and the ARC results check out with LLMs being bad at playing sudoku (so maybe not that surprising).
This seems to be in contradiction with what people on this site generally think. Is the disagreement mainly that system 2 thinking will be a relatively fast unlock (this is my take at least[1]) whereas Francois thinks it will take a long time?
Or does it go deeper?
Personally my intuition that LLMs are world modelers and system 2 thinking will be a relatively simple unlock as they get better at modeling the world.