This is a linkpost for https://arxiv.org/abs/2212.09196
Also, in the Q&A session of the lecture, people discuss some difficult analogical reasoning tasks that most people resort to solving "symbolically" and iteratively, for example, by trying to apply different possible patterns and mentally check whether there are no logical contradictions, GPT-3 somehow manages to solve too, i.e., in a single auto-regressive rollout. This reminds me of GPT can write Quines now (GPT-4): both these capabilities seem to point to a powerful reasoning capability that Transformers have but people don't.
Taylor Webb, Keith J. Holyoak, Hongjing Lu, December 2022
GPT-4
In one type of analogical reasoning where GPT-3 still fared poorer than humans, story analogies, GPT-4 significantly improved. In the lecture about this paper at Santa Fe Institute, Taylor Webb shared the results of GPT-4 testing:
Taylor: "I was most astounded by that GPT-4 often produces very precise explanations of why one of the answers is not a very good answer. […] all of the same things happen [in the stories], and then [GPT-4] would say, ‘The difference is, in this case, this was caused by this, and in that case, it wasn’t caused by that.’ Very precise explanations of the analogies."
I also recommend listening to the Q&A session after the lecture.