ChatGPT defines 10 concrete terms: generically, for 5- and 11-year-olds, and for a scientist

Bill Benzon

This is cross-posted from New Savanna.

The difference between concrete concepts, that is, concepts that can be understood entirely in sensorimotor terms, and abstract terms is an important one. It was, for example, important to David Hays when I studied with him back in the 1970s. We ended up adapting a model developed by William Powers as a way of thinking about concrete concepts while using Hays’s concept of metalingual definition to think about abstract concepts. Somewhat later Steven Harnad coined the term, “symbol grounding problem,” to indicate that the meanings of symbols had to somehow be grounded in the physical world. In 2016 Harnad and his colleagues investigated the structure of dictionaries and discovered that the vast majority of terms in them were defined in terms of other words but that there is a small Kernel that is not so-defined. I wondered how that distinction would play out in ChatGPT.

ChatGPT – that is, the LLM underlying it – doesn’t have access to the physical world. It can’t ground any terms in sensorimotor experience. However, knowing a great deal about how language works and having spent a great deal of time working with ChatGPT, I was sure that it would be perfectly capable to defining concrete terms. But how would it do so?

I decided to investigate. I made a list of ten concrete terms and asked ChatGPT to define them. In my first trial I made no further specification beyond simply asking for a definition. In subsequent trials I asked for definitions appropriate for 5-year-olds, 11-year-olds, and professional biologists (just the biological concepts). ChatGPT gave distinctly different kinds of definitions in each case, which did not surprise me since over a year ago I had asked ChatGPT to describe digestion, first generically, and then in terms appropriate for a 3-year-old. It did so, and fairly successfully.

Here are the trials. Each trial is a separate session. As always, my prompt is in boldface while the response is in plainface.

Initial Trial

Define the following words:

cat
rain
tulip
car
food
tree
fire
wind
hole
animal

Sure, here are definitions for each of those words:

Cat: A small domesticated carnivorous mammal with soft fur, a short snout, and retractile claws, often kept as a pet and valued for its companionship and ability to hunt pests.
Rain: Moisture condensed from the atmosphere that falls visibly in separate drops. It is an essential component of the Earth's water cycle, replenishing water sources and sustaining life.t
Tulip: A bulbous spring-flowering plant of the lily family, with boldly colored cup-shaped flowers. Tulips are native to Eurasia and North Africa and are widely cultivated for their attractive blooms.t
Car: A road vehicle, typically with four wheels, powered by an internal combustion engine or electric motor, used for transportation of passengers or goods.t
Food: Any nutritious substance that people or animals eat or drink to maintain life and growth. Food provides essential nutrients such as carbohydrates, proteins, fats, vitamins, and minerals.t
Tree: A woody perennial plant typically having a single stem or trunk growing to a considerable height and bearing lateral branches at some distance from the ground.t
Fire: A process in which substances combine chemically with oxygen from the air and typically give out bright light, heat, and smoke; combustion or burning.t
Wind: The natural movement of air, especially in the form of a current of air blowing from a particular direction. Wind can be gentle or powerful and is caused by differences in air pressure.t
Hole: A hollow place in a solid body or surface. Holes can be natural, like those formed in the ground, or artificial, like those made by drilling or digging.t
Animal: A living organism that feeds on organic matter, typically having specialized sense organs and nervous systems and able to respond rapidly to stimuli. Animals are multicellular and belong to the kingdom Animalia.

Comments: The vocabulary is only moderately sophisticated, with various abstract terms, which is more or less what I expected. Since I wanted to see what it would do without using abstract terms, I asked ChatGPT to produce definitions for children.

Five-year-old

Define the following words in a way that is appropriate for a five-year-old child:

[same list as above]

Sure!

Cat: A soft, furry friend that says "meow" and loves to play and cuddle.
Rain: When the sky cries little drops of water, making everything outside wet and giving plants a drink.
Tulip: A pretty flower that comes in lots of colors like red, pink, and yellow, and blooms in the spring.
Car: A big, shiny toy that takes us places, like to school or to visit Grandma and Grandpa.
Food: Yummy things we eat like pizza, apples, and cookies that give us energy to play and grow.
Tree: A tall, strong plant with a woody trunk and green leaves that gives us shade and makes the air fresh.
Fire: Bright, hot flames that keep us warm when we're camping and toast marshmallows to make yummy s'mores.
Wind: Invisible air that likes to dance and blow things around, like kites in the sky and leaves on the ground.
Hole: A little space in the ground or in something where you can put your hand or a toy, like in the sand at the beach or in a doughnut.
Animal: All the different creatures like dogs, birds, and fish that live with us on Earth and have families just like we do.

Comment: There are almost no abstract terms in these definitions. I’ve highlighted possible abstract terms in yellow.

11-year-old

Define the following words in a way that is appropriate for an eleven-year-old child:

[same list as above]

Cat: Cats are furry animals with pointy ears, a cute nose, and a long tail. They like to nap a lot, chase things like strings or toys, and sometimes purr when they're happy.
Rain: Rain is when water falls from the sky like tears from clouds. It makes everything outside wet and can be fun to play in with rain boots and a raincoat.
Tulip: Tulips are colorful flowers that grow in gardens. They have big, bright petals that come in lots of different colors like red, yellow, pink, and purple.
Car: A car is like a big metal box with wheels that people use to get around. It takes us places like school, the grocery store, or on vacation, and it goes vroom vroom!
Food: Food is what we eat to give us energy and keep us healthy. It can be yummy things like pizza, fruit, vegetables, or cookies.
Tree: Trees are tall, strong plants with a big trunk and lots of branches. They have green leaves that change colors in the fall and give us shade in the summer.
Fire: Fire is like a dancing, glowing thing that gives off heat and light. We use it to cook marshmallows for s'mores, keep us warm when camping, or in a fireplace at home.
Wind: Wind is like the air moving around outside. Sometimes it's gentle and feels nice on your skin, and other times it's strong and can blow things around like leaves or kites.
Hole: A hole is like a space or gap in something. It can be small, like a tiny gap in the ground, or big, like a tunnel that goes underground.
Animal: Animals are living creatures that can move around, eat, and breathe. They come in all shapes and sizes, like dogs, birds, fish, and elephants.

Comment: These aren’t very different from the ones for 5-year-olds. The terms are pretty much the same, but these definitions are a bit longer.

A Professional Biologist

Define the following words the way a professional biologist might:

cat
tulip
food
tree
animal

Sure, here are definitions of each term from a biological perspective:

Cat: A domesticated carnivorous mammal of the family Felidae, typically characterized by retractile claws, a short snout, and a wide range of coat colors and patterns. Cats are members of the order Carnivora and are highly specialized predators, exhibiting keen senses and agility suited for hunting small prey.
Tulip: A perennial bulbous plant of the genus Tulipa, belonging to the family Liliaceae. Tulips are characterized by their showy, cup-shaped flowers with six petals and a range of colors including red, yellow, pink, and white. They reproduce sexually through seeds and asexually through bulb division.
Food: Any substance consumed to provide nutritional support for an organism's physiological processes. In biological terms, food encompasses a diverse array of organic and inorganic compounds containing essential nutrients such as carbohydrates, proteins, lipids, vitamins, and minerals. It serves as a source of energy and raw materials for growth, maintenance, and repair of tissues in living organisms.
Tree: A woody perennial plant characterized by a single main stem or trunk, supporting branches and foliage above ground. Trees belong to various plant families and are vital components of terrestrial ecosystems, providing habitat, shade, oxygen production, and carbon sequestration. They undergo secondary growth, developing concentric rings of wood and bark over time.
Animal: A multicellular, eukaryotic organism belonging to the kingdom Animalia, characterized by heterotrophic nutrition, locomotion, and typically possessing specialized sensory organs. Animals exhibit a wide range of morphological, physiological, and behavioral adaptations for survival in diverse environments. They can be further classified into various phyla, classes, orders, families, genera, and species based on evolutionary relationships and shared characteristics.

Comment: That’s pretty much what I expected, lots of abstract technical terms.

We seem to have three basic “levels” of definition for these terms: what I’ll call Generic, Child: 5- and 11-year-olds, and Scientific: for the biologist.

How did ChatGPT learn to make such distinctions?

It seems unlikely to me that I did it by taking accounts of children, professional scientists, and ordinary adults and deriving the appropriate kinds of definitions from those accounts. It seems more likely that it worked from examples is language appropriate to these groups. Given that the LLM was trained or more or less the whole internet, such examples were available, though I’d like to know what kinds of examples of child-focused language it was working from. I’d also like to know how these levels are discourse are organized within the LLM. Level of discourse would seem to be orthogonal to subject area. With 175 billion parameters, there’s obviously many ways to skin this cat, as it were.

[-]metachirality1y10

I don't think there are necessarily any specific examples in the training data. LLMs can generalize to text outside of the training distribution.

[-]gwern1y*61

Oh, there's tons and tons of this kind of data online, I bet. Even GPT-3 could do 'ELI5', remember (and I wouldn't be surprised if GPT-2 could too since it could do 'tl;dr'). You have stuff like Simple English Wiki, you have centuries of children's literature (which will often come with inline metadata like "Newberry Award winner" or "a beloved classic of children's literature" or "recommended age range: 6-7yo", you have children's dictionaries ('kid dictionary', 'student dictionary', 'dictionary for kids', 'elementary dictionary'), you will have lots of style parody text transfer examples where someone rewrites "X but if it were a children's novel", you have 'young adult literature' intermediate, textbook anthologies of writing aimed at specific grades, micro-genres like "Anglish" or "Up-Goer-Five" (the latter aimed partially at children)...

No, there's nothing impressive or 'generalizing' about this. This is all well within-distribution.

If anything, rather than being surprisingly good, the given definitions seem kinda... insulting and bad and age-inappropriate and like ChatGPT is condescending rather than generating a useful pedagogically-age-appropriate definition? Here's an actual dictionary-for-children defining 'cat': https://kids.wordsmyth.net/we/?rid=6468&ent_l=cat

a small, furry mammal with whiskers, short ears, and a long tail. Cats, also called house cats, are often kept as pets or to catch mice and rats.

any of the larger wild animals related to the kind of cat kept as a pet. Tigers, lions, and bobcats are all cats. Cats are carnivorous mammals.

Which is quite different from

Cat: A soft, furry friend that says "meow" and loves to play and cuddle.

(this is more of a pre-k or toddler level definition)

or 11yo:

Cat: Cats are furry animals with pointy ears, a cute nose, and a long tail. They like to nap a lot, chase things like strings or toys, and sometimes purr when they're happy.

Which is, er... I was a precociously hyper-literate 11yo, as I expect most people reading LW were, but I'm pretty sure even my duller peers in 6th or 7th grade in middle school, when we were doing algebra and setting up school-sized exhibits about the Apollo space race and researching it in Encyclopedia Britannica & Encarta and starting to upgrade to the adult dictionaries and AIM chatting all hours, would've been insulted to be given a definition of 'cat' like that...

I assume OP thought that there was some specific place in the training data the LLM was replicating.

[-]gwern1y51

Indeed, and my point is that that seems entirely probable. He asked for a dictionary definition of words like 'cat' for children, and those absolutely exist online and are easy to find, and I gave an example of one for 'cat'.

(And my secondary point was that ironically, you might argue that GPT is generalizing and not memorizing... because its definition is so bad compared to an actual Internet-corpus definition for children, and is bad in that instantly-recognizable ChatGPTese condescending talking-down bureaucrat smarm way. No human would ever define 'cat' for 11yos like that. If it was 'just memorizing', the definitions would be better.)

[-]Bill Benzon1y10

Whatever one means by "memorize" is by no means self-evident. If you prompt ChatGPT with "To be, or not to be," it will return the whole soliloquy. Sometimes. Other times it will give you an opening chunk and then an explanation that that's the well known soliloquy, etc. By poking around I discovered that I could elicit the soliloquy by giving it prompts that consisting of syntactically coherent phrases, but if I gave it prompts that were not syntactically coherent, it didn't recognize the source, that is, until a bit more prompting. I've never found the idea that LLMs were just memorizing to be very plausible.

In any event, here's a bunch of experiments explicitly aimed at memorizing, including the Hamlet soliloquy stuff: https://www.academia.edu/107318793/Discursive_Competence_in_ChatGPT_Part_2_Memory_for_Texts_Version_3

I was assuming lots of places widely spread. What I was curious about was a specific connection in the available data between the terms I used in my prompts and the levels of language. gwern's comment satisfies that concern.

Of course, but it does need to know what a definition is. There are certainly lots of dictionaries on the web. I'm willing to assume that some of them made it into the training data. And it needs to know that people of different ages use language at different levels of detail and abstraction. I think that requires labeled data, like children's stories labeled as such.

I think that requires labeled data.

It doesn't and the developers don't label the data. The LLM learns that these categories exist during training because they can and it helps minimize the loss function.

By labeled data I simply mean that children's stories are likely to be identified as such in the data. Children's books are identified as children's books. Otherwise, how is the model to "know" what language is appropriate for children? Without some link between the language and a certain class of people it's just more text. My prompt specifies 5-year olds. How does the model connect that prompt with a specific kind of language?