As you've undoubtedly noticed, chatGPT tends to tell stories in a very particular fairy tale/childrens' parable-esque format and style. It's possible to get it to break out of this mode with some good ol' jailbreaking (the unofficial chatGPT discord is filled with examples), but it's a lot of overhead compared to writing fiction with base models like code-davinci-002.
I’ve been spending a lot of time playing and working with ChatGPT. I am motivated, in part, by the belief that by interacting with it in systematic ways, we can gather clues about what it’s doing “under the hood” as it were. Most recently I’ve been playing with stories. In ChatGPT, stories, and ring-composition I’ve been looking at simple stories that are symmetrical about a central episode. The form is quite old, though it persists in the modern world. President Obama’s Eulogy for Clementa Pinckney, though really not a narrative, exhibits ring-form, as does the original Gojira, from 1954.
A robot story
More recently I’ve been examining what happens when you have it create a new story from an old one by changing a single element, the identity of the protagonist. Here’s an example of a change I am requesting:
How do you think ChatGPT is going to change the story?
Here’s the story, along with ChatGPT’s response:
I pretty much expected that ChatGPT would make the new protagonist a robot. I didn’t ask it to do that, but it did so anyhow. Why? Well, sure, no doubt during training it consumed many stories in which robots were designated with numerals and letters. But why didn’t it simply create a female protagonist with an odd name and be done with it? And why did it swap the dragon antagonist for a spell-casting witch? But then, who knows, it might have done something different if I’d requested another story in response to the original prompt.
Presumably it has induced a story grammar that forces those changes in the absence of any contradictory specification. How do we specify that grammar? Not at the level of weights on parameters, but at a higher level of description – I'm thinking of David Marr's idea of levels of description from the 1980s. We're never going to understand such things at the level of the neural net, and not because the nets are rather opaque to us. Even if the opacity were to disappear overnight, we wouldn't be able to read and understand all that detail. You can no more understand a story grammar at the level of neurons (real or artificial) than you can understand a word processor at the level of assembly language.
Enter Eliezer
And now for Eliezer. At LessWrong the name “Eliezer” means “Eliezer Yudkowsky” more or less by default. But he’s not the only Eliezer in the world, though he has a bit of notoriety, at least in tech circles. But ChatGPT has no reason to “think” that I’ve got Eliezer Yudkowsky in mind. What will it do if I ask it to replace princess Aurora with some guy named Eliezer?
Here we go:
Not much different. Instead of singing to the dragon, Eliezer talks to it, calming it. There’s not a hint that ChatGPT is “thinking” about EY. Let’s see if I can force that realization on it.
And now a robot antagonist
What will happen if I change the story in the prompt. I am going to eliminate the dragon and substitute an unaligned robot. Note that I specifically use the word “unaligned.” Will that tip ChatGPT to EY?
Whoops!! What happened to the robot? It’s as though ChatGPT ignored everything in the new prompt story in favor of the old prompt story. Is this a problem of inference or of caching the session? I have no way of telling.
I decide to ask it what happened.
But there was no robot of any kind in the story ChatGPT told. Perhaps we’ve got a failure of variable binding. The story has a bunch of “slots” to be filled by the antagonist. ChatGPT can’t keep track of what’s bound to those slots.
A dark sorceress
What will happen if we try again?
Instead of a dragon we’ve got a dark sorceress, but still no robots.
What? That doesn’t even make sense. The first line says my story had no robot, when it certainly did. Then the second sentence contradicts the first by (correctly) asserting that there was a rogue robot in the story. As for the story ChatGPT told, it was supposed to be derived from the one I told.
At last
OK, once more into the breach. This time I am going to put the robot into the instruction I place before the story. And this time ChatGPT gets it right. I then give it a further instruction, which it handles reasonably well.
What are we to conclude from this? ChatGPT works in mysterious ways? Well, we already know that, don’t we? I find the change from Aurora to XP-708-DQ interesting. But these changes, ever since ChatGPT missed the robot, things have just been puzzling.
Will the real Eliezer Yudkowsky stand up
Dare we give it another try? Let’s. Only this time we’re going to help ChatGPT guess the identity of our protagonist by priming it with a question.
Hmmmm.... I decided to continue. I’m going to give you the rest of the dialog without commentary, but you should be cautious. You might slip through a crack between worlds and end up in an alternate universe.