Dwarkesh had a podcast recently with Francois Chollet (creator of Keras)
He seems fairly skeptical we are anywhere near AGI with LLMs. He mostly bases his intuition that LLMs fail on OOD tasks and don't seem to be good at solving simple abstract reasoning problems he calls the ARC challenge. It seems he thinks system 2 thinking will be a much harder unlock than people think and that scaling LLMs will go nowhere. In fact he goes so far as to say the scaling maximalists have set back AGI progress by 5-10 years. Current LLMs to him are just simply information retrieval databases.
He, along with the CEO of Zapier, have launched a 1 million dollar prize to beating the ARC bench marks, which are apparently hard for LLMs. I didn't believe it at first, given how easy they seem, but barely any progress has been made on the ARC bench marks in the last 4 years. In retrospect, it's odd that so many existing benchmarks rely heavily on memorized knowledge, and the ARC results check out with LLMs being bad at playing sudoku (so maybe not that surprising).
This seems to be in contradiction with what people on this site generally think. Is the disagreement mainly that system 2 thinking will be a relatively fast unlock (this is my take at least[1]) whereas Francois thinks it will take a long time?
Or does it go deeper?
- ^
Personally my intuition that LLMs are world modelers and system 2 thinking will be a relatively simple unlock as they get better at modeling the world.
I was also surprised that interpreting webpages was a major blocker. They're in text and HTML, as you say.
I don't remember who said this, but I remember believing them since they'd actually tried to make useful agents. They said that actual modern webpages are such a flaming mess of complex HTML that the LLMs get confused easily.
Your last point, whether the direction to easier-to-align AGI or more time to work on alignment is preferable is a very complex issue. I don't have a strong opinion since I haven't worked through it all. But I think there are very strong reasons to think LLM-based AGI is far easier to align than other forms, particularly if the successful approach doesn't heavily rely on RL. So I think your opinion is in the majority, but nobody has worked it through carefully enough to have a really good guess. That's a project I'd like to embark on by writing a post making the controversial suggestion that maybe we should be actively building LMA AGI as the safest of a bad set of options.