Some people have expressed that “GPT-2 doesn’t understand anything about language or reality. It’s just huge statistics.” In at least two senses, this is true.


First, GPT-2 has no sensory organs. So when it talks about how things look or sound or feel and gets it right, it is just because it read something similar on the web somewhere. The best understanding it could have is the kind of understanding one gets from reading, not from direct experiences. Nor does it have the kind of understanding that a person does when reading, where the words bring to mind memories of past direct experiences.


Second, GPT-2 has no qualia. This is related to the previous point, but distinct from it. One could imagine building a robotic body with cameras for eyes and microphones for ears that fed .png and .wav files to something like GPT-2 rather than .html files. Such a system would have what might be called experiences of the world. It would not, however, create an direct internal impression of redness or loudness, the ineffable conscious experience that accompanies sensation.


However, this is too high a bar to rule out understanding. Perhaps we should call the understanding that comes from direct personal experience “real understanding” and the kind that comes solely from reading with no connection to personal experience “abstract understanding.” Although I can’t “really understand” what it was like to fight in the Vietnam War (because I wasn’t there, man) I can still understand it in an abstract sense. With an abstract understanding, here are some things one can do:


• answer questions about it in one’s own words
• define it
• use it appropriately in a sentence
• provide details about it
• summarize it


Professional teachers distinguish between tests of knowledge (which can be handled by mere memorization) and tests of understanding, with the latter being more difficult and useful (see Bloom’s Taxonomy). Understanding requires connecting a new idea to ideas a student is already familiar with.


GPT-2 is able to pass many such tests of understanding. With an appropriate prompt (such as giving examples of what form the answer to a question should take) it is able to answer questions, define terms, use words appropriately in a sentence, provide details, and summarize.


This is understanding for most practical purposes. It shows that when GPT-2 uses a word, that word has the appropriate kinds of connections to other words. The word has been integrated into a large graph-like structure of relationships between what can reasonably be called concepts or ideas. When probabilities for the next token have been generated, it has a certain propensity for using a particular word; but if that word is artificially blocked, other ways of saying the same thing also have been activated and will be used instead. It is reasonable to interpret this as having an “idea” of what it “wants” to “say” and at some point the quotation marks are no longer helpful, and we may as well dispense with them.


Here is an example. I input the following prompt into GPT-2 1.5B, with top-k=10 sampling:


"Indiana Jones ducked as he entered the cave to avoid being decapitated." In this sentence, the word "decapitated" means


Here are the first 10 results (truncated after the first sentence):
• "to be cut down" as well as "to be slain."
• "to chop off".
• "to cut off one of the branches of a tree."
• "The captain of the ship was killed in the cave."
• "to cut off, cut off by decapitation."
• "cut off".
• "cut off."
• to be "sliced off."
• "to be killed," which is the same thing as "to be killed by the sword."
• to fall from high altitude or to be cut down.
• "to have a head chopped off."


The system has a strong notion that “decapitated" means “to cut off” and “to kill” but is less likely to mention that the word has anything to do with a head. So its concept of “decapitation” appears to be approximately (but not completely) right. When prompted to write a sentence using the word “decapitate,” the sentences the system usually generates are consistent with this, often being used in a way consistent with killing, but only rarely mentioning heads. (This has all gotten rather grisly.)


However, one shouldn't take this too far. GPT-2 uses concepts in a very different way than a person does. In the paper “Evaluating Commonsense in Pre-trained Language Models,” the probability of generating each of a pair of superficially similar sentences is measured. If the system is correctly and consistently applying a concept, then one of the two sentences will have a high probability and the other a low probability of being generated. For example, given the four sentences


1. People need to use their air conditioner on a hot day.
2. People need to use their air conditioner on a lovely day.
3. People don’t need to use their air conditioner on a hot day.
4. People don’t need to use their air conditioner on a lovely day.


Sentences 1 and 4 should have higher probability than sentences 2 and 3. What they find is that GPT-2 does worse than chance on these kinds of problems. If a sentence is likely, a variation on the sentence with opposite meaning tends to have similar likelihood. The same problem occurred with word vectors, like word2vec. “Black” is the opposite of “white,” but except in the one dimension they differ, nearly everything else about them is the same: you can buy a white or black crayon, you can paint a wall white or black, you can use white or black to describe a dog’s fur. Because of this, black and white are semantically close, and tend to get confused with each other.


The underlying reason for this issue appears to be that GPT-2 has only ever seen sentences that make sense, and is trying to generate sentences that are similar to them. It has never seen sentences that do NOT make sense and makes no effort to avoid them. The paper “Don't Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training” introduces such an “unlikelihood objective” and shows it can help with precisely the kinds of problems mentioned in the previous paper, as well as GPT-2’s tendency to get stuck in endless loops.


Despite all this, when generating text, GPT-2 is more likely to generate a true sentence than the opposite of a true sentence. “Polar bears are found in the Arctic” is far more likely to be generated than “Polar bears are found in the tropics,” and it is also more likely to be generated than “Polar bears are not found in the Arctic” because “not found” is a less likely construction to be used in real writing than “found.”


It appears that what GPT-2 knows is that the concept polar bear has a found in relation to Arctic but that it is not very particular about the polarity of that relation (found in vs. not found in.) It simply defaults to expressing the more commonly used positive polarity much of the time.


Another odd feature of GPT-2 is that its writing expresses equal confidence in concepts and relationships it knows very well, and those it knows very little about. By looking into the probabilities, we can often determine when GPT-2 is uncertain about something, but this uncertainty is not expressed in the sentences it generates. By the same token, if prompted with text that has a lot of hedge words and uncertainty, it will include those words even if it is a topic it knows a great deal about.


Finally, GPT-2 doesn’t make any attempt to keep its beliefs consistent with one another. Given the prompt The current President of the United States is named, most of the generated responses will be variations on “Barack Obama.” With other prompts, however, GPT-2 acts as if Donald Trump is the current president. This contradiction was present in the training data, which was created over the course of several years. The token probabilities show that both men’s names have fairly high likelihood of being generated for any question of the kind. A person discovering that kind of uncertainty about two options in their mind would modify their beliefs so that one was more likely and the other less likely, but GPT-2 doesn't have any mechanism to do this and enforce a kind of consistency on its beliefs.


In summary, it seems that GPT-2 does have something that can reasonably be called “understanding” and holds something very much like “concepts” or “ideas” which it uses to generate sentences. However, there are some profound differences between how a human holds and uses ideas and how GPT-2 does, which are important to keep in mind.

New Comment
23 comments, sorted by Click to highlight new comments since:

The interesting content kept me reading, but it would help the reader to have lines between paragraphs in the post.

Some people have expressed that “GPT-2 doesn’t understand anything about language or reality. It’s just huge statistics.” In at least two senses, this is true.

My complaint is that GPT-2 isn't able to reason with whatever "understanding" it has (as shown by FeepingCreature's example “We are in favor of recycling, because recycling doesn’t actually improve the environment, and that’s why we are against recycling.”) which seems like the most important thing we want in an AI that "understands language".

With an abstract understanding, here are some things one can do: • answer questions about it in one’s own words • define it • use it appropriately in a sentence • provide details about it • summarize it

I suggest that these are all tests that in a human highly correlates with being able to reason with a concept (which again is what we really want) but the correlation apparently breaks down when we're dealing with AI, so the fact that an AI can pass these tests doesn't mean as much as it would with a human.

At this point we have to decide whether we want the word "understand" to mean "... and is able to reason with it" and I think we do because if we say "GPT-2 understands language" then a lot of people will misinterpret that as meaning that GPT-2 can do verbal/symbolic reasoning, and that seems worse than the opposite confusion, where we say "GPT-2 doesn't understand language" and people misinterpret that as meaning that GPT-2 can't give definitions or summaries.

One way we might choose to draw these distinctions is using the technical vocabulary that teachers have developed. Reasoning about something is more than mere Comprehension: it would be called Application, Analysis or Synthesis, depending on how the reasoning is used.

GPT-2 actually can do a little bit of deductive reasoning, but it is not very good at it.

One way we might choose to draw these distinctions is using the technical vocabulary that teachers have developed. Reasoning about something is more than mere Comprehension: it would be called Application, Analysis or Synthesis, depending on how the reasoning is used.

So would you say that GPT-2 has Comprehension of "recycling" but not Comprehension of "in favor of" and "against", because it doesn't show even the basic understand that the latter pair are opposites? I feel like even teachers' technical vocabulary isn't great here because it was developed with typical human cognitive development in mind, and AIs aren't "growing up" the same way.

So would you say that GPT-2 has Comprehension of "recycling" but not Comprehension of "in favor of" and "against", because it doesn't show even the basic understand that the latter pair are opposites?

Something like that, yes. I would say that the concept "recycling" is correctly linked to "the environment" by an "improves" relation, and that it Comprehends "recycling" and "the environment" pretty well. But some texts say that the "improves" relation is positive, and some texts say it is negative ("doesn't really improve") and so GPT-2 holds both contradictory beliefs about the relation simultaneously. Unlike humans, it doesn't try to maintain consistency in what it expresses, and doesn't express uncertainty properly. So we see what looks like waffling between contradictory strongly held opinions in the same sentence or paragraph.

As for whether the vocabulary is appropriate for discussing such an inhuman contraption or whether it is too misleading to use, especially when talking to non-experts, I don't really know. I'm trying to go beyond descriptions of GPT-2 "doesn't understand what it is saying" and "understands what it is saying" to a more nuanced picture of what capabilities and internal conceptual structures are actually present and absent.

[-][anonymous]70

This article is attacking a straw man. The skeptics (I am one) are saying that GPT-2 doesn’t understand what it is reading or writing because it is unable to abstract reading into concepts or apply its writing skills to communicate complex ideas. Even more importantly I would say it is uninteresting by itself because it lacks any form of original concept creation and concept modeling.

Without further architecturally different additions it is just a remixing engine that can do nothing more than present new perspectives on things it has already consumed. Like the GLUT, the intelligence that it shows is just a funhouse mirror reflection of the intelligence that went into creating the content it consumed in training.

I don't think I am attacking a straw man: You don't believe GPT-2 can abstract reading into concepts, and I was trying to convince you that it can. I agree that current versions can't communicate ideas too complex to be expressed in a single paragraph. I think it can form original concepts, in the sense that 3-year old children can form original concepts. They're not very insightful or complex concepts, and they are formed by remixing, but they are concepts.

[-][anonymous]50

Ok I think we are talking past each other, hence the accusation of a straw man. When you say "concepts" you are referring to the predictive models, both learned knowledge and dynamic state, which DOES exist inside an instance of GPT-2. This dynamic state is initialized with the input, at which point it encodes, to some degree, the content of the input. You are calling this "understanding."

However when I say "concept modeling" I mean the ability to reason about this at a meta-level. To be able to not just *have* a belief which is useful in predicting the next token in a sequence, but to understand *why* you have that belief, and use that knowledge to inform your actions. These are 'lifted' beliefs, in the terminology of type theory, or quotations in functional programming. So to equate belief (predictive capability) and belief-about-belief (understanding of predictive capability) is a type error from my perspective, and does not compute.

GPT-2 has predictive capabilities. It does not instantiate a conceptual understanding of its predictive capabilities. It has no self-awareness, which I see as a prerequisite for "understanding."

Yeah, you're right. It seems like we both have a similar picture of what GPT-2 can and can't do, and are just using the word "understand" differently.

Sentences 1 and 4 should have higher probability than sentences 2 and 3. What they find is that GPT-2 does worse than chance on these kinds of problems. If a sentence is likely, a variation on the sentence with opposite meaning tends to have similar likelihood.

I can anecdotally confirm this; I've been personally calling this the "GPT swerve", ie. sentences of the form "We are in favor of recycling, because recycling doesn't actually improve the environment, and that's why we are against recycling."

The proposed explanation makes sense as well. Is anyone trying to pre-train a GPT-2 with unlikelihood avoidance?

Nice analysis!

The only part I'm skeptical about is the qualia argument. If it's supposed to be ineffable, why be sure it doesn't have it? If it's effable after all, then we can be more specific: for example, we might want words to be associated with abstract representations of sensory experience, which can be used for things like imagination, composition with other concepts, or top-down control.

My thinking was that since everything it knows is something that was expressed in words, and qualia are thought to not be expressed fully in words, then qualia aren't part of what it knows. However, I know I'm on shaky ground whenever I talk about qualia. I agree that one can't be sure it doesn't have qualia, but it seems to me more like a method for tricking people into thinking it has qualia than something that actually does.

Sentences 1 and 4 should have higher probability than sentences 2 and 3. What they find is that GPT-2 does worse than chance on these kinds of problems. If a sentence is likely, a variation on the sentence with opposite meaning tends to have similar likelihood.

...

Despite all this, when generating text, GPT-2 is more likely to generate a true sentence than the opposite of a true sentence. “Polar bears are found in the Arctic” is far more likely to be generated than “Polar bears are found in the tropics,” and it is also more likely to be generated than “Polar bears are not found in the Arctic” because “not found” is a less likely construction to be used in real writing than “found.”

Hm. These sound contradictory to me?

My understanding is that a sentence's proability of being generated is closely related to its likelihood; closely enough that if a sentence has similar likelihood as its negation, it should have similar probability of generation, and vice versa. But then the first quote says "true sentences have similar, but lower likelihood than their negations" and the second says "true sentences have higher likelihood than their negations".

Assuming I've got that right, what gives?

Related question: what's the precise ranking of sentences 1-4? The quote suggests that some aggregation of 2 and 3 is ranked higher than the same aggregation of 1 and 4; but is it 2>3>1>4, or 2>1>3>4, or what?

What GPT-2 SAT score would convince the skeptics?

[-][anonymous]20

I fail to see the relevance of such a score?

Yeah, skipped a step, sorry. SAT tends to be touted as a "measure [of] literacy, numeracy and writing skills that are needed for academic success in college. They state that the SAT assesses how well the test-takers analyze and solve problems." If so, then an AI that can do well on this test is expected to be able to learn to "analyze and solve problems" in a rather general range. At that point the argument about whether the AI can "keep its beliefs consistent with one another", at least as much as a human can, which is not very much, would become moot. The test is also standardized and really not easy to game for a human, even with intensive preparation, so it's not nearly as subjective as various Turing tests. Hope this makes sense.


[-][anonymous]20

First off standardized tests are incredibly easy to game by humans and there is an entire industry around it. My roommate in college tutored people on this and routinely sat and scored perfect scores on all standardized entrance examinations (SAT, ACT, LSAT, GRE, etc.) as an advertising ploy. This is despite scoring mediocre when he took it the first time for college and not being intrinsically super bright or anything. The notion that these are a real test for anything other than teachable test taking skills is propaganda from the testing industry. Most prestigious schools are in the process of removing standardized tests from entrance consideration since it is demonstrated to be a poor heuristic for student performance.

But there is an even more fundamental issue I think, which is that GPT-2 more resembles a compressed GLUT or giant Markov chain in than it does a thinking program that computes intelligent solutions for itself.

Scott seems to disagree:

Despite popular misconceptions, the SAT is basically an IQ test, and doesn’t really reward obsessive freaking out and throwing money at the problem.

I am not inclined to argue about this particular point though. Scott tends to know what he writes about and whenever his mistakes are pointed out he earnestly adds a post to the list of mistakes. So I go with his take on it.

But there is an even more fundamental issue I think, which is that GPT-2 more resembles a compressed GLUT or giant Markov chain in than it does a thinking program that computes intelligent solutions for itself.

Maybe. I don't know enough about either the brain architecture (which looks like a hodge podge of whatever evolution managed to cobble together) or the ML architecture (which is probably not much closer to intelligent design), and I do not really care. As long as AI behaves like an IQ 120+ human, I would happily accept a mix of GLUTs and Markov chains as a reasonable facsimile of intelligence and empathy.

[-][anonymous]40

As long as AI behaves like an IQ 120+ human, I would happily accept a mix of GLUTs and Markov chains as a reasonable facsimile of intelligence and empathy.

It doesn’t though, that’s the point! It cannot form plans. It cannot work towards coherent, long term goals, or really operate as an agent at all. It is unable to form new concepts and ideas. It is a very narrow AI only really able to remix its training data in a way that appears on the surface to approximate human writing style. That’s all it can do.

I see your stance, and it looks like further discussion is no longer productive. We'll see how things turn out.

[-][anonymous]40

I don’t care who disagrees. If he’s got statistics then I defy the data. This is something you can go out and test in the real world. Get a practice test book, test yourself on one time test, learn some techniques, test yourself on the next test to see what difference it makes, and repeat. I’ve done this and the effect is very real. Training centers have demonstrated the effect with large group sizes.

Great post. I am not sure I'd go as far as “abstract understanding”, feels more like "Extreme memorization" to me. Yet there's no question that there is *some kind* of understanding.

As far as qualia, yes, never say never, but seems to me the word can only make sense and be used within a general AI framework where concepts like causality, recursion, self-reference, etc ... have been cracked and implemented. Something that GPT-2 (and the entire AI field) is far, far from having achieved.

Same points could also be made about the remarkable Aristo ( https://arxiv.org/abs/1909.01958 )