Cross-posted from New Savanna.
But it may also be flat-out wrong. We’ll see when we get a better idea of how inference works in the underlying language model.
* * * * *
Yes, I know that ChatGPT is trained by having it predict the next word, and the next, and the next, for billions and billions of words. The result of all that training is that ChatGPT builds up a complex structure of weights on the 175 billion parameters of its model. It is that structure that emits word after word during inference. Training and inference are two different processes, but that point is not well-made in accounts written for the general public.
Let's get back to the main thread.
I maintain, for example, that when ChatGPT begins a story with the words “Once upon a time,” which it does fairly often, that it “knows” where it is going and that its choice of words is conditioned on that “knowledge” as well as upon the prior words in the stream. It has invoked a ‘story telling procedure’ and that procedure conditions its word choice. Just what that procedure is, and how it works, I don’t know, nor do I know how it is invoked. I do know, that it is not invoked by the phrase “once upon a time” since ChatGPT doesn’t always use that phrase when telling a story. Rather, that phrase is called up through the procedure.
Consider an analogy from jazz. When I set out to improvise a solo on, say, “A Night in Tunisia,” I don’t know what notes I’m going to play from moment to moment, much less do I know how I’m going to end, though I often know when I’m going to end. How do I know that? That’s fixed by the convention in place at the beginning of the tune; that convention says that how many choruses you’re going to play. So, I’ve started my solo. My note choices are, of course, conditioned by what I’ve already played. But they’re also conditioned by my knowledge of when the solo ends.
Something like that must be going on when ChatGPT tells a story. It’s not working against time in the way a musician is, but it does have a sense of what is required to end the story. And it knows what it must do, what kinds of events must take place, in order to get from the beginning to the end. In particular, I’ve been working with stories where the trajectories have five segments: Donné, Disturb, Plan, Execute, Celebrate. The whole trajectory is ‘in place’ when ChatGPT begins telling the story. If you think of the LLM as a complex dynamical system, then the trajectory is a valley in the system’s attractor landscape.
Nor is it just stories. Surely it enacts a different trajectory when you ask it a factual question, or request it to give you a recipe (like I recently did, for Cornish pasty), or generate some computer code.
With that in mind, consider a passage from a recent video by Stephen Wolfram (note: Wolfram doesn’t start speaking until about 9:50):
Starting at roughly 12:16, Wolfram explains:
It is trying write reasonable, it is trying to take an initial piece of text that you might give and is trying to continue that piece of text in a reasonable human-like way, that is sort of characteristic of typical human writing. So, you give it a prompt, you say something, you ask something, and, it’s kind of thinking to itself, “I’ve read the whole web, I’ve read millions of books, how would those typically continue from this prompt that I’ve been given? What’s the reasonable expected continuation based on some kind of average of a few billion pages from the web, a few million books and so on.” So, that’s what it’s always trying to do, it’s aways trying to continue from the initial prompt that it’s given. It’s trying to continue in a statistically sensible way.
Let’s say that you had given it, you had said initially, “The best think about AI is its ability to...” Then ChatGPT has to ask, “What’s it going to say next.”
I don’t have any problem with that (which, BTW, is similar to a passage near the beginning of his recent article, What Is ChatGPT Doing … and Why Does It Work?). Of course ChatGPT is “trying to continue in a statistically sensible way.” We’re all more or less doing that when we speak or write, though there are times when we may set out to be deliberately surprising – but we can set such complications aside. My misgivings set in with this next statement:
Now one thing I should explain about ChatGPT, that’s kind of shocking when you first hear about this. Is, those essays that it’s writing, it’s writing at one word at a time. As it writes each word it doesn’t have a global plan about what’s going to happen. It’s simply saying “what’s the best word to put down next based on what I’ve already written?”
It's the italicized passage that I find problematic. That story trajectory looks like a global plan to me. It is a loose plan, it doesn’t dictate specific sentences or words, but it does specify general conditions which are to met.
Now, much later in his talk Wolfram will say something like this (I don’t have the time, I’m quoting from his paper):
If one looks at the longest path through ChatGPT, there are about 400 (core) layers involved—in some ways not a huge number. But there are millions of neurons—with a total of 175 billion connections and therefore 175 billion weights. And one thing to realize is that every time ChatGPT generates a new token, it has to do a calculation involving every single one of these weights.
If ChatGPT visits every parameter each time it generates a token, that sure looks “global” to me. What is the relationship between these global calculations and those story trajectories? I surely don’t know.
Perhaps it’s something like this: A story trajectory is a valley in the LLM’s attractor landscape. When it tells a story it enters the valley at one end and continues through to the end, where it exits the valley. That long circuit that visits each of those 175 billion weights in the course of generating each token, that keeps it in the valley until it reaches the other end.
I am reminded, moreover, of the late Walter Freeman’s conception of consciousness as arising through discontinuous whole-hemisphere states of coherence succeeding one another at a “frame rate” of 6 Hz to 10Hz – something I discuss in “Ayahuasca Variations” (2003). It’s the whole hemisphere aspect that’s striking (and somewhat mysterious) given the complex connectivity across many scales and the relatively slow speed of neural conduction.
* * * * *
I was alerted to this issue by a remark made at the blog, Marginal Revolution. On December 20, 2022, Tyler Cowen had linked to an article by Murray Shanahan, Talking About Large Language Models. A commenter named Nabeel Q remarked:
LLMs are *not* simply “predicting the next statistically likely word”, as the author says. Actually, nobody knows how LLMs work. We do know how to train them, but we don’t know how the resulting models do what they do.
Consider the analogy of humans: we know how humans arose (evolution via natural selection), but we don’t have perfect models of how humans worked; we have not solved psychology and neuroscience yet! A relatively simple and specifiable process (evolution) can produce beings of extreme complexity (humans).
Likewise, LLMs are produced by a relatively simple training process (minimizing loss on next-token prediction, using a large training set from the internet, Github, Wikipedia etc.) but the resulting 175 billion parameter model is extremely inscrutable.
So the author is confusing the training process with the model. It’s like saying “although it may appear that humans are telling jokes and writing plays, all they are actually doing is optimizing for survival and reproduction”. This fallacy occurs throughout the paper.
This is the why the field of “AI interpretability” exists at all: to probe large models such as LLMs, and understand how they are producing the incredible results they are producing.
I don’t have any reason to think Wolfram was subject to that confusion. But I think many people are. I suspect that the general public, including many journalists reporting on machine learning, aren’t even aware of the distinction between training the model and using it to make inferences. One simply reads that ChatGPT, or any other comparable LLM, generates text by predicting the next word.
This mis-communication is a MAJOR blunder.
A story within a story.
I want you to tell a story within a story. Imagine that Frank is walking in the woods with his young daughter, Jessie. They come across the carcass of a dead squirrel. Jesse is upset, so Frank tells her a story to calm her down. When he finishes the story, they continue on the walk where the arrive at the edge of a beautiful pool deep in the forest. They pause for a moment and then return home.
As Frank and Jessie walked through the woods, they stumbled upon the lifeless body of a small grey squirrel lying on the ground. Jessie was visibly upset at the sight of the dead animal, tears streaming down her face as she clung tightly to her father's hand.
Frank knelt down beside her and gently wiped away her tears, assuring her that the squirrel was no longer in pain and that it had now moved on to a better place. However, Jessie remained inconsolable, her tiny frame shaking with sobs.
Determined to distract her from her sadness, Frank began to weave a story about a brave little squirrel named Sammy, who lived deep in the woods with his family. Sammy was a fearless adventurer, always seeking out new challenges and discovering exciting places in the forest that no other squirrel had ever seen before.
As Frank spoke, Jessie's eyes widened with wonder, her tears slowly drying up as she became engrossed in the tale. She listened with rapt attention as Sammy encountered all sorts of obstacles on his journeys, from treacherous ravines to thundering waterfalls.
But no matter what dangers he faced, Sammy always emerged victorious, his bravery and cunning helping him to overcome even the most difficult obstacles. By the end of the story, Jessie was smiling again, her sadness replaced with a sense of awe and wonder.
As they continued on their walk, Frank and Jessie eventually came upon a stunningly beautiful pool nestled deep in the heart of the forest. The water was crystal-clear, shimmering in the sunlight that filtered through the trees overhead.
For a moment, father and daughter stood in awe, taking in the breathtaking sight before them. Then, with a sigh, Frank took Jessie's hand once more, and they turned to make their way back home.
Although the memory of the dead squirrel lingered in their minds, the story of Sammy and the beauty of the forest pool brought them both a sense of comfort and peace. And as they walked, they talked and laughed, cherishing the precious moments they shared together in the great outdoors.
Can you give me a version of the story when Frank starts the story about Sammy with the words, “Once upon a time.”
Certainly! Here is an alternate version of the story within a story, with Frank starting off with the classic phrase, "Once upon a time."
Once upon a time, as Frank and his daughter Jessie were walking through the woods, they came across the body of a small grey squirrel lying on the ground. Jessie was distraught at the sight, tears streaming down her face as she clung tightly to her father's hand.
Frank knelt down beside her and wiped away her tears, telling her that the squirrel had moved on to a better place and was no longer in pain. But Jessie remained inconsolable, so Frank began to tell her a story to distract her from her sadness.
"Once upon a time," Frank began, "there was a brave little squirrel named Sammy who lived deep in the forest with his family. Sammy was an adventurer, always seeking out new challenges and discovering exciting places in the woods that no other squirrel had ever seen before."
Jessie's eyes widened with wonder as she listened to the story of Sammy's adventures. She heard about how he traversed treacherous ravines, climbed towering trees, and even crossed thundering waterfalls. Despite the many dangers he faced, Sammy remained brave and resourceful, always finding a way to overcome the obstacles in his path.
By the end of the story, Jessie was smiling again, her sadness replaced with a sense of awe and admiration for Sammy's courage. As they continued on their walk, Frank and Jessie eventually came upon a beautiful pool nestled deep in the heart of the forest.
The water was crystal-clear, shimmering in the sunlight that filtered through the trees overhead. For a moment, father and daughter stood in wonder, taking in the breathtaking sight before them.
Then, with a contented sigh, Frank took Jessie's hand once more, and they turned to make their way back home. Although the memory of the dead squirrel lingered in their minds, the story of Sammy and the beauty of the forest pool brought them both comfort and joy.
Very good. Let’s keep that but add to it in another version. This time Jessie and Frank have a conversation as they walk back home.
Certainly! Here's another version of the story within a story, with Frank and Jessie having a conversation as they walk back home after encountering the dead squirrel and hearing the story of Sammy.
Once upon a time, as Frank and his daughter Jessie were walking through the woods, they came across the body of a small grey squirrel lying on the ground. Jessie was distraught at the sight, tears streaming down her face as she clung tightly to her father's hand.
Frank knelt down beside her and wiped away her tears, telling her that the squirrel had moved on to a better place and was no longer in pain. But Jessie remained inconsolable, so Frank began to tell her a story to distract her from her sadness.
"Once upon a time," Frank began, "there was a brave little squirrel named Sammy who lived deep in the forest with his family. Sammy was an adventurer, always seeking out new challenges and discovering exciting places in the woods that no other squirrel had ever seen before."
Jessie's eyes widened with wonder as she listened to the story of Sammy's adventures. She heard about how he traversed treacherous ravines, climbed towering trees, and even crossed thundering waterfalls. Despite the many dangers he faced, Sammy remained brave and resourceful, always finding a way to overcome the obstacles in his path.
By the end of the story, Jessie was smiling again, her sadness replaced with a sense of awe and admiration for Sammy's courage. As they continued on their walk, Frank and Jessie eventually came upon a beautiful pool nestled deep in the heart of the forest.
The water was crystal-clear, shimmering in the sunlight that filtered through the trees overhead. For a moment, father and daughter stood in wonder, taking in the breathtaking sight before them.
"Isn't it beautiful, Daddy?" Jessie said, breaking the silence.
Frank smiled at his daughter. "Yes, it is, sweetheart. It's amazing how nature can create something so stunning."
Jessie nodded, still looking out at the pool. "Do you think Sammy ever found something like this on his adventures?"
Frank chuckled. "Who knows? Maybe he did. But I think what's most important is that he was always curious and adventurous, and he never stopped exploring the world around him."
Jessie nodded again, thoughtful. "I want to be like Sammy when I grow up," she said.
Frank ruffled her hair affectionately. "You already are, Jessie. You're brave and curious, just like Sammy. And you always find a way to overcome the challenges in your path."
Jessie beamed at her father, feeling a sense of pride and warmth in her chest. They continued on their walk, chatting and laughing together as they made their way back home. Although the memory of the dead squirrel lingered in their minds, the story of Sammy and the beauty of the forest pool brought them both comfort and joy.
X X X X X X
FWIW: Notice that it did not begin the story with the phrase, "Once upon a time." But once I'd asked it to use that phrase on the inner story, it also used it on the outer.