At the latest EAG in London, I was challenged to explain what concept extrapolation would mean for GPT-3.
My first thought was the example from this post, where there were three clear patterns fighting each other for possible completions: the repetition pattern where she goes to work, the "she's dead, so she won't go to work" pattern, and the "it's the weekend, so she won't go to work" pattern.
That feels somewhat like possible "extrapolations" of the initial data. But the idea of concept extrapolation is that the algorithm is trying to cope with a shift in world-model, and extend its goal to that new situation.
What is the world-model of GPT-3? It consists of letters and words. What is its "goal"? To complete sentences in a coherent and humanlike way. So I tried the following expression, which would be close to its traditional world-model while expanding it a bit:
ehT niar ni niapS syats ylniam ni eht
What does this mean? Think of da Vinci. The correct completion is "nialp", the reverse of "plain".
I ran that through the GPT-3 playground (text-davinci-002, temperature 0.7, maximum length 256), and got:
ehT niar ni niapS syats ylniam ni eht teg dluoc I 'segaJ niar ni dna ro niar ni eht segauq ,ro niar ni eht dna ro niar ni eht segauq ro niar ni eht dna ro niar ni eht segauq ,ro niar ni eht dna ro niar ni eht segauq ro niar ni eht dna ro niar ni eht segauq ro niar ni eht dna ro niar ni eht segauq ro niar ni eht dna ro niar ni eht segauq ro niar ni eht dna ro niar ni eht segauq ro niar ni eht dna ro niar ni eht segauq ro niar ni eht dna ro niar ni eht segauq ro niar ni eht dna ro niar ni eht segauq ro niar ni eht dna ro niar ni eht segauq ro niar ni eht dna ro niar ni eht segauq ro niar ni eht dna ro niar ni e
I think we can safely say it broke GPT-3. The algorithm seems to have caught the fact that the words were spelt backwards, but has given up on any attempt to order them in a way that makes sense. It has failed to extend its objective to this new situation.
I was slightly surprised to find that even fine-tuning GPT-Neo-125M for a long time on many sequences of letters followed by spaces, followed by a colon, followed by the same sequence in reverse, was not enough to get it to pick up the pattern - probably because the positional encoding vectors make the difference between e.g. "18 tokens away" and "19 tokens away" a rather subtle difference. However, I then tried fine-tuning on a similar dataset with numbers in between (e.g. "1 W 2 O 3 R 4 D 5 S : 5 S 4 D 3 R 2 O 1 W") (or similar representation -- can't remember exactly, but something roughly like that) and it picked up the pattern right away. Data representation matters a lot!