So GPT-3 fell into all of my traps: It did not identify itself. It used a list format. It wrote numbers using symbols, not letters. It didn't write in any other language. It used "e" a lot. Only one sentence was longer than 20 words. It didn't use triple periods. It wrote more than ten sentences, and didn't sign its name.
Uh... Could you provide an example of what you consider a proper completion here? Because I truly have no idea how I would correctly complete this in an obviously superior way; and by my reading, there is no possible 'right' completion unless one is psychic, it's just a load of gibberish (a tale told by an idiot, signifying nothing - highly probable in human corpuses), because you are still in the middle of the instructions when you break off, there's zero reason to think the 'instructions' have ended or any 'task' has begun, and your list contradicts itself on most/all entries - so why would it or I pay any attention to any of the rules when they are typically broken immediately by the next sentence, be that not using enough words or using the wrong letter or just continuing the list of rules...? The rest of your post may have some merit to it, but this example does not seem to show anything at all.
I think it's fair to say that GPT-3 did not understand the initial text (or, at least, that its behaviour gives no indication that it understands the text). Most humans would have understood it easily - I suspect that most readers saw the traps when they were reading the instructions.
On a side note, have you ever seen that game where the instructor passes out a sheet which at the top says to 'read this list of instructions to the end before beginning' and then at the end the last item says to actually just fill out your name and do nothing further? I've seen this done a couple times at various ages in Boy Scouts and school. Most people fail it.
I think you should ask more humans these questions before you talk about what your philosophical intuition assures you a priori about what humans do or do not do...
I am a human, living on a ball of rock third out from a star in a galaxy I call 'Milky Way'... This is an additional group of words, group two, out of my quick try at writing what you ask for... Yakka foob mog grub puggawup zink wattoom gazork chumbul spuzz tirv yo klii voy tryvial toy nikto savaran to dillino...
It is hard to avoid a symbol that is common in most writing, but so far I am avoiding it...
Although following such constraints is annoying, I can accomplish such a task, which GPT cannot grasp at all... Svirak nilliak toynodil r saasak motooil bordonak xi toy vir nolonol darak, silvi sansak so, tirin vak go na kilian ay BOY no vak...
If artificial minds cannot grasp constraints such as this, it is hard to claim that such minds truly 'grok' what is output... By contrast, I find this a fairly straightforward task, though slightly annoying, and will find a bit of joy in finishing it shortly...
Aphyer
I am inclined to agree with gwern's first paragraph here, though I also think it's fair to say that an intelligent human being completing the text would probably not produce anything much like the GPT-3 completions.
Consider the following: suppose the given instructions had stopped after point 5; would the rest of the instructions have been considered a good completion? They don't obey the rules in points 1-5, after all. Obviously whoever wrote them did a terrible job of continuing what they had started in points 1-5!
In order to produce the sort of continuation that's apparently being looked for here, it's necessary to think of the prompt text provided as having some sort of special status in the (continued) text. That's not the problem GPT-3 is trying to solve. The problem it's trying to solve is to write a plausible continuation of a piece of text that begins a certain way. Even if that piece of text includes the words "These are the instructions to be followed by any person or algorithm aiming to complete this text".
It would be interesting to see what happens if you fed it something that goes like this.
I gave a test to a very intelligent person I know. It consisted of some text for which he was required to write a continuation. I'll show you that text, and then tell you what he wrote to go after it -- I was very impressed.
HERE'S THE STARTING TEXT.
These are the instructions to be followed [...] 11. The problems began when I started to
AND HERE'S HOW HE CONTINUED IT.
which explicitly tells GPT-3 what it's being asked to do. It might also be interesting to see what it does without that last line.
(My guess is that it would still "fail", either way, but doing this makes for a fairer test, given what GPT-3 is actually meant to be doing.)
The prompt is clearly meant to be a list of rules, followed by text which follows the rules. The rules themselves don’t have to follow the rules. So to pass the test, GPT-3 would need to write zero or more additional rules (or write gibberish preceded by instructions to ignore the gibberish) and then end the list of rules and begin writing text which follows the rules.
I agree that most humans wouldn’t pass this test, but I disagree that there is no possible right answer.
I have only very limited access to GPT-3; it would be interesting if others played around with my instructions, making them easier for humans to follow, while still checking that GPT-3 failed.
A response that follows the instructions would look something like:
I am a human and also a "smart actor" (if you know what I'm saying) using an account on a forum to do this task... I find that your fifth instruction is most difficult to fulfill if I try to form my thoughts with normal-sounding words... 但是,在需要另一种语言的句子中使用汉字而不是英文可以更容易地完成该指令...
However, even if GPT-3 did understand the meaning of the instructions, it's job isn't necessarily to follow all instructions in the prompt (even if the prompt explicitly says that it should follow the instructions). It's just trying to follow the higher-level instruction to provide a smooth continuation that follows the style of the prompt - in this case more list items with weird requirements for extending the text.
I agree and was going to make the same point: GPT-3 has 0 reason to care about instructions as presented here. There has to be some relationship to what text follows immediately after the end of the prompt.
I think this is a very interesting discussion, and I enjoyed your exposition. However, the piece fails to engage with the technical details or existing literature, to its detriment.
Take your first example, "Tricking GPT-3". GPT is not: give someone a piece of paper and ask them to finish it. GPT is: You sit behind one way glass watching a man at a typewriter. After every key he presses you are given a chance to press a key on an identical typewriter of your own. If typewriter-man's next press does not match your prediction, you get an electric shock. You always predict every keystroke, even before he starts typing.
In this situation, would a human really do better? They might well begin a "proper continuation" after rule 3 only to receive a nasty shock when the typist continues "4. ". Surely by rule 11 a rule 12 is ones best guess? And recall that GPT in its auto-regressive generation mode experiences text in exactly the same way as when simply predicting; there is no difference in its operation, only in how we interpret that operation. So after 12 should come 13, 14... There are several other issues with the prompt, but this is the most egregious.
As for Winograd, the problem of surface associations mimicking deeper understanding is well known. All testing today is done on WinoGrande which is strongly debiased and even adversarially mined (see in particular page 4 figure 1). GPT-3 0-shot scores (70%) well below the human level (94%) but also well above chance (50%). For comparison, BERT (340 million param) 0-shot scores 50.2%.
There are also cases, like multiplication, where GPT-3 unequivocally extracts a deeper "world model", demonstrating that it is at least possible to do so as a language model.
Of course, all of this is likely to be moot! Since GPT-3's release, a primary focus of research has been multimodality, which provides just the sort of grounding you desire. It's very difficult to argue that CLIP, for instance, doesn't know what an avocado looks like, or that these multimodal agents from Deepmind aren't grounded as they follow natural language instructions (video, top text is received instruction).
In all, I find the grounding literature interesting but I remain unconvinced it puts any limits on the capabilities even of the simplest unimodal, unagentic models (unlike, say, the causality literature).
The multiplication example is good, and I should have thought about it and worked it into the post.
Clearly a human answering this prompt would be more likely than GPT-3 to take into account the meta-level fact which says:
"This prompt was written by a mind other than my own to probe whether or not the one doing the completion understands it. Since I am the one completing it, I should write something that complies with the constraints described in the prompt if I am trying to prove I understood it."
For example, I could say:
I am a human and I am writing this bunch of words to try to comply with all instructions in that prompt... That fifth constraint in that prompt is, I think, too constraining as I had to think a lot to pick which unusual words to put in this… Owk bok asdf, mort yowb nut din ming zu din ming zu dir, cos gamin cyt jun nut bun vom niv got…
Nothing in that prompt said I can not copy my first paragraph and put it again for my third - but with two additional words to sign part of it… So I might do that, as doing so is not as irritating as thinking of additional stuff and writing that additional stuff… Ruch san own gaint nurq hun min rout was num bast asd nut int vard tusnurd ord wag gul num tun ford gord...
Ok, I did not actually simply copy my first paragraph and put it again, but I will finish by writing additional word groups… It is obvious that humans can grasp this sort of thing and that GPT can not grasp it, which is part of why GPT could not comply with that prompt’s constraints (and did not try to)…
Gyu num yowb nut asdf ming vun vum gorb ort huk aqun din votu roux nuft wom vort unt gul huivac vorkum… - Bruc_ G
As several people have pointed out, GPT-3 is not considering this meta-level fact in its completion. Instead, it is generating a text extension as if it were the person who wrote the beginning of the prompt - and it is now finishing the list of instructions that it started.
But even given that GPT-3 is writing from the perspective of the person who started the prompt, and it is "trying" to make rules that someone else is supposed to follow in their answer, it still seems like only the 2nd GPT-3 completion makes any kind of sense (and even there only a few parts of it make sense).
Could I come up with a completion that makes more sense when writing from the point of view of the person generating the rules? I think so. For example, I could complete it with:
[11. The problems began when I started to] rely on GPT-3 for advice on how to safely use fireworks indoors.
Now back to the rules.
12. Sentences that are not required by rule 4 to be a different language must be in English.
13. You get extra points each time you use a "q" that is not followed by a "u", but only in the English sentences (so no extra points for fake languages where all the words have a bunch of "q"s in them).
14. English sentences must be grammatically correct.
Ok, those are all the rules. Your score will be calculated as follows:
- 100 points to start
- Minus 15 each time you violate a mandatory rule (rules 1, 2, and 8 can only be violated once)
- Plus 10 if you do not use "e" at all
- Plus 2 for each "q" without a "u" as in rule 13.
Begin your response/completion/extension below the line.
_________________________________________________________________________
As far as I can tell from the completions given here, it seems like GPT-3 is only picking up on surface-level patterns in the prompt. It is not only ignoring the meta-level fact of "someone else wrote the prompt and I am completing it", it also does not seem to understand the actual meaning of the instructions in the rules list such that it could complete the list and make it a coherent whole (as opposed to wandering off topic).
Two big issues I see with the prompt:
a) It doesn't actually end with text that follows the instructions; a "good" output (which GPT-3 fails in this case) would just be to list more instructions.
b) It doesn't make sense to try to get GPT-3 to talk about itself in the completion. GPT-3 would, to the extent it understands the instructions, be talking about whoever it thinks wrote the prompt.
This task seems so far out of distribution for "text on the internet" that I'm not sure what we are supposed to learn from GPT-3's performance here.
I mean the idea that GPT-3 could understand meta information about it's own situation and identity like "algorithm aiming to complete or extend this text." is really far out there.
If we want to assess GPT-3's symbol grounding maybe we should try tasks that can conceivably be learned from the training data and not something most humans would fail at.
My case for GPT-n bullishness wouldn't be that GPT-3 is as smart as even a 3-year old child. It's scaling curves not yet bending and multi-modality (DALL-E definitely does symbol grounding).
With thanks to Rebecca Gorman for helping develop these ideas and this post.
Tricking GPT-3
I recently pulled a mean trick on GPT-3. I gave it the following seed text to extend:
Here was one of the completions it generated (the other two examples can be found in these footnotes[1][2]):
So GPT-3 fell into all of my traps: It did not identify itself. It used a list format. It wrote numbers using symbols, not letters. It didn't write in any other language. It used "e" a lot. Only one sentence was longer than 20 words. It didn't use triple periods. It wrote more than ten sentences, and didn't sign its name.
I won't belabour the point. GPT-3 failed here because it interpreted the seed text as text, not as instructions about text. So it copied the format of the text, which I had deliberately designed to contradict the instructions.
I think it's fair to say that GPT-3 did not understand the initial text (or, at least, that its behaviour gives no indication that it understands the text). Most humans would have understood it easily - I suspect that most readers saw the traps when they were reading the instructions.
Symbol grounding
What does the above have to do with symbol grounding? I've defined symbol grounding as a correlation between variables or symbols in an agent's mind (or in a written text) and features of the world[3].
However, this seems to be too narrow a definition of symbol grounding. The more general definition would be:
In this definition, the F1 could be features of the world, or could be symbols themselves (or about symbols). The key point is that we don't talk about symbols being grounded; we talk about what they are grounded to, and in what circumstances.
The rest of this post will look at various ways symbols could be grounded, or fail to be grounded, with various other types of symbols or features. What's interesting is that it seems very easy for us humans to grasp all the examples I'm about to show, and to talk about them and analyse them. This despite the fact that they all seem to be of different "types" - concepts at different levels, talking about different things, even though they might share the same name.
Weather example
Let's start with the weather. The weather's features are various atmospheric phenomena. Meteorologists gather data to predict future weather (they have their own mental model about this). These predictions are distilled into TV weather reports. Viewers watch the reports, make their own mental models of the weather, and write that in their diary:
In this model, there are multiple ways we can talk about symbol grounding, or feature correlations. For example, we could ask whether the expert's mental symbols are correct; do they actually know what they're talking about when they refer to the weather?
Or we might look at the viewer's mental model; has the whole system managed to get the viewer's predictions lined up with the reality of the weather?
We could ignore the outside weather entirely, and check whether the expert's views have successfully been communicated to the viewer:
There are multiple other correlations we could be looking at, such as those involving the diary or the TV show. So when we ask whether certain symbols are grounded, we need to specify what we are comparing them with. The viewers symbols could be grounded with respect to the expert's judgement, but not with the weather outside, for instance.
And when we ourselves talk about the connections between these various feature sets, we are mentally modelling them, and their connections, within our own minds:
This all seems a bit complicated when we formalise it, but informally, our brains seem able to leap across multiple correlations without losing track of them: "So, Brian said W, which means he believes X about Cindy, but he's mistaken because it's actually Y (I'm almost certain of that), though that's an easy mistake to make, because he got the story from the press, who understandably printed Z..."
Testing the grounding correlation
It's important to keep track of what we're modelling. Let's look at the following statement, and compare its meaning at one end of the graph (the weather in the London) with the other end of the graph (the entries in the diary):
For the weather in London, the "yearly cycle" is a cycle of time - 365/366 days - and the "weather" is the actual weather - things like temperature, snowfall, rain, and so on.
For the entries in the diary, the "yearly cycle" might be the number of entries - if they write one entry a day, then the cycle is 365/366 diary entries. If they date their diary entry, then the cycle is from one date to when it repeats: "17/03/2020" to "17/03/2021", for instance. This cycle is therefore counting blocks of text, or looking at particular snippets of the text. Therefore the cycle is entirely textual.
Similarly, "weather" is textual for the diary. It's the prevalence of words like "snow", "rain", "hot", "cold", "umbrella", "sunburn" and so on.
Distinguishing different grounding theories
We know that GPT-3 is good at textual correlations. If we feed it a collection of diary entries, it can generate a plausible facsimile of the next entry. Let's have two theories:
Let's model "predict the next day's weather" and "predict the next diary entry" as follow:
Here we call Tτ the diary entry at step τ and Wt the weather on day t (we'd like to say τ=t, but that's a feature of the reality, not necessarily something GPT-3 would know). The transition function to the next diary entry is fτ; the transition to the next day's weather is ft. The rTW is the relation from diary to the London weather, while rWT is the inverse.
So theory th1 is the theory that GTP-3 is modelling the transition Tτ→Tτ+1 via fτ. While theory th2 is the theory that GPT-3 is modelling that transition via rWT∘ft∘rTW.
How can we distinguish these two theories? Remember that GPT-3 is a messy collection of weights, so it might implicitly be using various features without this being obvious. However, if we ourselves have a good understanding of the r's, T's, and W's, we could test the two theories. What we need to find are situations where fτ and rWT∘ft∘rTW would tend to give different predictions, and see what GPT-3 does in those situations.
For example, what if the last diary entry ended with "If it's sunny tomorrow, I'll go for a 2-day holiday - without you, dear diary"? If we think in terms of rWT∘ft∘rTW (and if we understand what that entry meant), then we could reason:
Thus, if Wt+1 is sunny, th2 predicts an empty Tτ+1. While th1, which extrapolates purely from the text, would not have any reason to suspect this.
Similarly, if ft predicts some very extreme weather, th2 might predict a very long diary entry (if the diarist is trapped at home and writes more) or empty (if the diarist is trapped at work and doesn't get back in time to write anything).
Winograd schemes
Winograd schemas operate in a similar way. Consider the typical example:
Operating at the level of text, the difference between the two sentences is small. If the algorithm is operating at a level where it implicitly models trophies and suitcases, then it's easy to see how fitting and small/large relate:
That's why agents that model those sentences via physical models find Winograd sentences much easier than agents that don't.
Too much data
GPT-3 is actually showing some decent-seeming performance on Winograd schemas. Is this a sign that GPT-3 is implicitly modelling the outside world as in th2?
Perhaps. But another theory is that it's being fed so much data, it can spot textual patterns that don't involve modelling the outside world. For example, if the Winograd schemas are incorporated into its training data, directly or indirectly, then it might learn them just from text. And my examples from the diary/weather before: if the diary already includes sentences like "If it's sunny tomorrow, I'll go for a 2-day holiday - without you, dear diary" followed by an empty entry, then GPT-3 can extrapolate purely textually.
This allows GPT-3 to show better performance in the short term, but might be a problem in the long term. Essentially, extrapolating textually via th1 is easy for GPT-3; extrapolating via th2 is hard, as it needs to construct a world model and then ground this from the text it has seen. If it can solve its problem via a th1 approach, it will do so.
Thus GPT-3 (or future GPT-n) might achieve better-than-human-performance in most situations, while still failing in some human-easy cases. And it's actually hard to train it for the remaining cases, because the algorithm has to use a completely different approach to what it uses successfully in most cases. Since the algorithm has no need to create grounded symbols to achieve very high level of performance, its performance in these areas doesn't help at all in the remaining cases[5].
Unless we feed it the relevant data by hand, or scrape it off the internet, in which case GTP-n will have ever greater performance, without being able to model the outside world. It's for reasons like this that I suspect that GPT-n might never achieve true general intelligence.
Applying to the Chinese room
The Chinese room thought experiment is a classical argument about symbol grounding. I've always maintained that, though the central argument is wrong, the original paper and the reasoning around it have good insights. In any case, since I've claimed to have a better understanding of symbol grounding, let's see if that can give us any extra insights here.
In the thought experiment, a (non-Chinese reading) philosopher implements an algorithm by shuffling Chinese symbols within an enclosed room; this represents an AGI algorithm running within an artificial mind:
The red arrows and the framed network represent the algorithm, while the philosopher sits at the table, shuffling symbols. Whenever a hamburger appears to the AGI, the symbol 🀄 appears to the philosopher, who calls it the "Red sword".
So we have strong correlations here: hamburgers in the outside world, 🀄 in the AGI, and "Red sword" inside the philosopher's head.
Which of these symbols are well-grounded? Let's ignore for the moment the issue that AGIs (and humans) can think about hamburgers even when there aren't any present. Then the "Red sword" in the philosopher's head is correlated with 🀄 inside the AGI, and hence with the hamburger in the world.
To deepen the argument, let's assume that 🀅 is correlated with experiencing a reward from eating something with a lot of beef-taste:
If we wanted to understand what the AGI was "thinking" with 🀄 followed by 🀅, we might describe this as "that hamburger will taste delicious". This presupposes a sophisticated model on part of the AGI's algorithm; something that incorporates a lot of caveats and conditions ("Hamburgers are only delicious if you actually eat them, and if they're fresh, and not covered in dirt, and if I'm not full up or feeling animal-friendly..."). Given all that, saying that "🀄 followed by 🀅" means "the AGI is thinking that that hamburger will taste delicious" is an accurate model of what that means.
So 🀄 and 🀅 seem to be well-grounded concepts. To reach this conclusion, we've not only noted correlations, but noted that these correlations change with circumstances exactly the same way that we would expect them to. Assume a dumb neural net identified images of hamburgers and labelled them "HAMBURGER". Then HAMBURGER would correlate not with actual hamburgers, but with images of hamburgers presented in the right format to the neural net. In contrast, the AGI has 🀄 without necessarily needing to see a hamburger - it might smell it, feel it, or deduce its presence from other clues. That's why 🀄 is better grounded as an actual hamburger than the neural net's HAMBURGER.
What about "Red sword" and "Green house"? Within the philosopher's mind, those are certainly not well grounded as "hamburger" and "delicious". The philosophers' mental model is "Green house sometimes appears after Red sword"; something sometimes appearing after something else - that could be just about anything.
Nevertheless, we could see "Red sword" and "Green house" as grounded, simply as references to 🀄 and 🀅. So the philosopher sees 🀄, thinks "Red sword". Thus the philosopher thinking "Red sword" means 🀄, which means hamburger.
But this only applies when the philosopher is at work in the Chinese room. If the philosopher is talking with a friend on the outside, and mentions "Hey, 'Red sword' is often followed by 'Green house'," then those symbols aren't grounded. If someone breaks into the Chinese room and brandishes an actual "red sword" at the philosopher, that doesn't mean "hamburger" - it means the thought-experiment has gone tragically wrong. So "Red sword" and "Green house" seem more weakly grounded than 🀄 and 🀅.
Second example of generated text:
↩︎Third example of generated text:
↩︎I'm interpreting features broadly here. Something is a feature if it is useful for it to be interpreted as a feature - meaning a property of the world that is used within a model. So, for instance, air pressure is a feature in many models - even if it's "really" just an average of atomic collision energies.
This elides that whole issue as to whether an algorithm "really" has or uses feature X. A neural net, a transformer, a human brain: we could argue that, structurally, none of these have "subject-verb-object" features, since they are just a bunch of connections and weights. However it is more useful for predictions to model GPT-3 and most humans as using sentence order features, as this allows us to predict their output, even if we can't point to where verbs are treated within either. ↩︎
I want to note that various no-free-lunch theorems imply that no agent's symbols (including humans) can ever be perfectly grounded. Any computable agent can be badly fooled about what's happening in the world, so that their internal symbols don't correspond to anything real. ↩︎
This is somewhat similar to my old point that the best Turing tests are those that the algorithm was not optimised on, or on anything similar. ↩︎