I agree with you, but natural intelligence seems to be set up in a way so as to incentivise the construction of subroutines and algorithms that can help solve problems, at least among humans. What I mean is that we humans invented a calculator when we realised our brains are not very good at arithmetics, and now we have this device which is sort of like a technological extension of ourselves. A proper AGI implemented in computer hardware should absolutely be able to implement a calculator by its own determination, the fact that it doesn't speaks to the ill...
I am happy to consider a distinction between world models and n-gram models, I just still feel like there is a continuum of some sort if we look closely enough. n-gram models are sort of like networks with very few parameters. As we add more parameters to calculate the eventual probability in the softmax layer, at which point do the world models emerge. And when do we term them world models exactly. But I think we're on the same page with regards to the chess example. Your formulation of "GPT-4 does not care about learning chess" is spot on. And in my view that's the problem with GPT in general. All it really cares about is predicting words.
I think if we imagine an n-gram model where n approaches infinity and the size of the corpus we train on approaches infinity, such a model is capable of going beyond even GPT. Of course it's unrealistic, but my point simply is that surface level statistics in principle is enough to imitate intelligence the way ChatGPT does.
Of course, literally storing probabilities of n-grams is a super poorly compressed way of doing things, and ChatGPT clearly finds more efficient solutions as it moves through the loss landscape trying to minimize next token predict...
I don't understand why Eliezer changed his perspective about the current approach of Transformer next-token prediction not being the path towards AGI. It should not be surprising that newer versions of GPT will asymptotically approach (mimicry) of AGI, but that shouldn't convince anyone that they are going to break through that barrier without a change in paradigm. All the intelligent organisms we know of do not have imitation as their primary optimization objective - their objective function is basically to survive or avoid pain. As a result, they of cour...
Nice work. But I wonder why people are so surprised that these models and GPT would learn a model of the world. Of course they learn a model of the world. Even the skip-gram and CBOW word vectors people trained ages ago modelled the world, in the sense that for example named entities in vector space would be highly correlated with actual spatial/geographical maps. It should be 100% assumed that these models which have many orders of magnitude more parameters are learning much more sophisticated models of the world. What that tells us about their "intellige...
I agree that it's capable of doing that, but it just doesn't do it. If you ask it to multiply a large number, it confidently gives you some incorrect answer a lot of the time instead of using it's incredible coding skills to just calculate the answer. If it was trained via reinforcement learning to maximize a more global and sophisticated goal than merely predicting the next word correctly or avoiding linguistic outputs that some humans have labelled as good or bad, it's very possible it would go ahead and invent these tools and start using them, simply be... (read more)