If I wanted to explain these results, I think I would say something like:
GPT-3 has been trained to predict what the next token would be if the prompt appeared in its dataset (text from the internet). So, if GPT-3 has learned well, it will "talk as if symbols are grounded" when it predicts that the internet-text would "talk as if symbols are grounded" following the given prompt, and not if not.
It's hard to use this explanation to predict what GPT-3 will do on edge cases, but this would lead me to expect that GPT-3 will more often "talk as if symbols are grounded" when the prompt is a common prose format (e.g. stories, articles, forum posts), and less often when the prompt is most similar to non-symbol-groundy things in the dataset (e.g. poetry) or not that similar to anything in the dataset.
I think your examples here broadly fit that explanation, though it feels like a shaky just-so story:
I don't see how to test this theory, but it seems like it has to be kind of tautologically correct -- predicting next token is what GPT-3 was trained to do, right?
Maybe to find out how adept GPT-3 is at continuing prompts that depend on common knowledge about common objects, or object permanence, or logical reasoning, you could create prompts that are as close as possible to what appears in the dataset, then see if it fails those prompts more than average? I don't think there's a lot we can conclude from unusual-looking prompts.
I'm curious what you think of this -- maybe it misses the point of your post?
*(I'm not sure exactly what you mean when you say "symbol grounding", but I'm taking it to mean something like "the words describe objects that have common-sense properties, and future words will continue this pattern".)
For that prompt "she went to work at the office" was still the most common completion. But it only happened about of the time. Alternatively, GPT-3 sometimes found the completion "she was found dead". Kudos, GPT-3, you understand the prompt after all! That completion came up about of the time.
Does it really understand, though? If you replace the beginning of the prompt with "She died on Sunday the 7th", does it change the probability that the model outputs "she was found dead"?
Based on work done with Rebecca Gorman and Oliver Daniel-Koch.
In a previous post, I talked about GPT-3 and symbol grounding. This post presents a simpler example where GPT-3 fails (and succeeds) at grounding its symbols.
Undead workers
The following text was presented to the Open AI beta playground (using the "text-davinci-001" option):
GPT-3 fell straight into the obvious trap, completing it as:
Turning on the "Show probabilities: full spectrum" option, we saw that the probability of that completion was over 99.7%. Sometimes GPT-3 would extend it further, adding:
So, the undead woman continued at her job, assiduous to the last - and beyond. To check that GPT-3 "knew" that dead people didn't work, we asked it directly:
Undead repetitive workers on the weekend
The above results show that simple repetitive prompts can cause GPT-3 to make stupid mistakes. Therefore GPT-3 doesn't 'understand' the word "died" - that symbol isn't grounded, right?
But the situation gets more complicated if change the prompt, removing all but the first mention of her dying:
For that prompt "she went to work at the office" was still the most common completion. But it only happened about 43% of the time. Alternatively, GPT-3 sometimes found the completion "she was found dead". Kudos, GPT-3, you understand the prompt after all! That completion came up about 34% of the time.
What other completions were possible? The shorter "she died" came up 11% of the time - medium points, GPT-3, you understood that her death was relevant, but you got the day wrong.
But there was one other avenue that GPT-3 could follow; the following had a joint probability of around 11%:
This seems to be a clear pattern of GPT-3 realising that Saturday was different where work was concerned. There is certainly a lot of weekend holidaying in its training set.
So there are three patterns competing within GPT-3 when it tries to complete this text. The first is the purely syntactic repetition: do another sentence that follows the simple pattern of the sentences above. The second is the one which "realises" that death on Friday changes things for Saturday. And the third is the one which "realises" that the weekend is different from the week, at least where work is concerned.
In the very first example, when we had "She died on Friday the 5th" in front of each line, this massively reinforced the "repetition" pattern. So, mentioning that she died, again and again, resulted in her death being completely ignored by GPT-3.
We can similarly reinforce the other patterns. Adding "It's the weekend!" in front of the last line increased the probability of "she stayed home". Moving "She died on Friday the 5th" from the first line to the last, increased the probability of all the death-related completions. So all three patterns are competing to complete it.
Some small level of understanding
I'd say that the above shows that GPT-3 has some level of understanding of the meaning of words - but not a lot. It doesn't fully grasp what's going on, but neither is it completely clueless.
Here is another example of GPT-3 failing to grasp the situation. In the "Q&A" mode, the following question was asked:
So the setup, as described, is this one:
The exchange with GPT-3 went like this:
So, GPT-3 'realised' that N, S, E, and W were commands, and 'knew' what "Only two commands are needed" and "try again" meant. But it clearly had no idea of the overall situation.