My hot take:
Not too surprising to me, considering what GPT-3 could do. However there were some people (and some small probability mass remaining in myself) saying that even GPT-3 wasn't doing any sort of reasoning, didn't have any sort of substantial understanding of the world, etc. Well, this is another nail in the coffin of that idea, in my opinion. Whatever this architecture is doing on the inside, it seems to be pretty capable and general.
I don't think this architecture will scale to AGI by itself. But the dramatic success of this architecture is evidence that there are other architectures, not too far away in search space, that exhibit similar computational efficiency and scales-with-more-compute properties, that are useful for more different kinds of tasks.
My first thought was that they put some convolutional layers in to preprocess the images and then used the GPT architecture, but no, it's literally just GPT again....
Does this maybe give us evidence the brain isn't anywhere near a peak of generality, since we use specialised circuits for processing image data (which convolutional layers were based off of)
Not necessarily. There is no gene which hardcodes a convolutional kernel into the brain which we can look at and say, 'ah yes, the brain is implementing a convolution, and nothing else'. Attention mechanisms for images learn convolution-like patterns (just more flexibly, and not pre-hardwired): to the extent that convolutions are powerful because they learn things like spatial locality (which is obviously true & useful), we would expect any more general learning algorithm to also learn similar patterns and look convolution-like. (This is a problem whic... (read more)