My hot take:
Not too surprising to me, considering what GPT-3 could do. However there were some people (and some small probability mass remaining in myself) saying that even GPT-3 wasn't doing any sort of reasoning, didn't have any sort of substantial understanding of the world, etc. Well, this is another nail in the coffin of that idea, in my opinion. Whatever this architecture is doing on the inside, it seems to be pretty capable and general.
I don't think this architecture will scale to AGI by itself. But the dramatic success of this architecture is evidence that there are other architectures, not too far away in search space, that exhibit similar computational efficiency and scales-with-more-compute properties, that are useful for more different kinds of tasks.
This video conjectures that GPT-3 was literally just saving everything from the training corpus and remixing them, without complex reasoning. https://www.youtube.com/watch?v=SY5PvZrJhLE
The same conjecture could work for GPT-I
So, GPT-3 is something like Giant look-up table? Which approximate the answer between a few nearest recorded answers, but the whole actual intellectual work was performed by those who created the training dataset?