I write fiction. I'm also interested in how AI is going to impact the world. Among other things, I'd prefer that AI not lead to catastrophe. Let's imagine that I want to combine these two interests, writing fiction that explores the risks posed by AI. How should I go about doing so? More concretely, what ideas about AI might I try to communicate via fiction?
This post is an attempt to partially answer that question. It is also an attempt to invoke Cunningham's Law: I'm sure there will be things I miss or get wrong, and I'm hoping the comments section might illuminate some of these.
Holden's Messages
A natural starting point is Holden's recent blog post, Spreading Messages to Help With the Most Important Century. Stripping out the nuances of that post, here's a list of the messages that Holden would like to see spread:
- We should worry about conflict between misaligned AI and all humans.
- AIs could behave deceptively, so “evidence of safety” might be misleading.
- AI projects should establish and demonstrate safety (and potentially comply with safety standards) before deploying powerful systems.
- Alignment research is prosocial and great.
- It might be important for companies (and other institutions) to act in unusual ways.
- We're not ready for this.
However, as interesting as this list is, it's not what I'm looking for; I'm not looking for bottom-line messages to convey. Instead, I want to identify a list of smaller ideas that will help people to reach their own bottom lines by thinking carefully through the issues. The idea of instrument convergence might appear on such a list. The idea that alignment research is great would not.
One reason for my focus is that fiction writing is ultimately about details. Fiction might convey big messages, but it does so by exploring more specific ideas. This raises the question: which specific ideas?
Another reason for my focus is that I'm allergic to propaganda. I don't want to tell people what to think and would prefer to introduce ideas that can help people think for themselves. Of course, not all message fiction is propaganda, and I'm not accusing Holden of calling for propaganda. Still, my personal preference is to focus on how to convey the nuts and bolts needed to understand AI.[1]
What Nuts and Which Bolts?
So with context to hand, back to the question: what ideas about AI might someone try to convey via fiction? Here's a potential list:
- Basics of AI
- Neural networks are black boxes (though interpretability might help us to see inside).
- AI "Psychology"
- AI systems are likely to be alien in how they think. They are unlikely to think like humans.
- Orthogonality and instrumental convergence might provide insight into likely AI behaviour.
- AI systems might be agents, in some relatively natural sense. They might also simulate agents, even if they are not agents.
- Potential dangers from AI
- Outer misalignment is a potential danger, but in the context of neural networks so too is inner misalignment (related: reward misspecification and goal misgeneralisation).
- Deceptive alignment might lead to worries about a treacherous turn.
- The possibility of recursive improvement might influence views about takeoff speed (which might influence views about safety).
- Broader Context of Potential Risks
- Different challenges might arise in the case of a singleton, when compared with multipolar scenarios.
- Arms races can lead to outcomes that no-one wants.
- AI rights could be a real thing but also incorrect attribution of rights to AI could itself pose a risk (by making it harder to control AI behaviour).
So that's the list. Having seen it, one might naturally wonder why fiction is the right medium to communicate ideas like this. Part of the answer is that I think it's useful to explore ideas from many angles.
Another part of the answer is that conveying an idea is one thing but conveying an intuition is another. Humans are used to modelling other humans, and so it is likely that we'll anthropomorphise when considering AI. Fiction might help with this. It's one thing to state in factual tones that AI systems are likely to have an alien psychology. It's quite another to be shown a world in which humans come up against the alien.
So why communicate the ideas? Because it's plausibly good that those working on AI capabilities, those working on AI safety, and people more broadly are able to reflect on the implications of AI and can understand why many are concerned about it. And why fiction? In part, because an intuitive grasp can be as important as a grasp of facts.
AI Fables
I started this post with a hypothetical, imagining that I wanted to write fiction that explores AI risk. In reality, I doubt that I'll find a great deal of time to do so. Still, I'd be excited to see other people writing fiction of this sort.
Here's one genre of story I'd be interested to see more of: AI fables. Fables are short stories, with a particular aesthetic sensibility, that convey a lesson.
While I enjoy the aesthetic of fables I wouldn't want to narrow the focus too much, but I'd love to see more short stories, of the sort that could be read around a fire on a winter's night, that communicate a brief lesson about AI.
For example, stories of djinni and golems can be used to communicate the problem of outer misalignment; even if something does precisely what we tell it to, it can be hard to ensure that it does what we actually want it to. I'd love to see a fable that likewise communicated the problem of inner misalignment. I'd love to see a wide variety of such fables, exploring a range of ideas about AI, and maybe even a collection putting them in one place.
If you know of such a story, please link it in the comments. If you write such a story, please link it. And if you have thoughts or additions for the list of ideas in the post, I'd love to hear these.
The ideas in this post were developed in discussion with Elizabeth Garrett and Damon Sasi. Thanks also to Conor Barnes for feedback.
- ^
I'm also not confident in the bottom lines; I retain substantial uncertainty about how likely AI is to lead to extinction or something equally bad (as opposed to more mundane, but still awful, catastrophe). However, I feel far more confident that there is insight to be gleaned from reflection on the various concepts and ideas underlying the case for AI risk. So this is where I focus.
Wellington: “Let’s play a game.”
He picked up a lamp from his stall, and buffed it vigorously with the sleeve of his shirt, as though polishing it. Purple glittering smoke poured out of the spout and formed itself into a half meter high bearded figure wearing an ornate silk kaftan.
Wellington pointed at the genie.
Wellington: “Tom is a weak genie. He can grant small wishes. Go ahead. Try asking for something.”
Kafana: “Tom, I wish I had a tasty sausage.”
A tiny image of Kafana standing in a farmyard next to a house appeared by the genie. The genie waved a hand and a plate containing a sausage appeared in the image. The genie bowed, and the image faded away.
Wellington picked up a second lamp, apparently identical to the first and gave it a rub. A second genie appeared, similar to the first, but with facial hair that reminded Kafana of Ming the Merciless, and it was one meter tall.
Wellington: “This is Dick. He can also grant wishes. Try asking him the same thing.”
Kafana: “Dick, I wish I had a tasty sausage.”
The same image appeared, but this time instead of appearing on a plate, the sausage appeared sticking through the head of a Kafana in the image, who fell down dead. The genie gave a sarcastic bow, and again the image faded away.
Kafana: “Sounds like I’m better off with Tom.”
Wellington: “Ah, but Dick is more powerful than Tom. Tom can feed a handful of people. Dick could feed every person on the planet, if you can word your request precisely enough. Have another go.”
She tried several more times, resulting in whole cities being crushed by falling sausages, cars crashing as sausages distracted drivers at the wrong moment, and even the whole population of the world dying out from sausages that contained poison. Eventually she realised that she was never going to be able to anticipate every possible loophole Dick could find. She needed a different approach.
Kafana: “Dick, read my mind and learn to anticipate what sort of things I will approve of. Provide sausages for everyone in the way that would most please me if I understood the full effects of your chosen method.”
Dick grimaced, but waved his hand and the image showed wonderful sausages being served around the world with sensitivity, elegance and good timing. The image faded. Kafana raised clasped hands over her head in victory and jumped into the air.
Wellington nodded, and rubbed a third lamp, producing a happy smiling genie, two meters tall.
Wellington: “This is Harry. He tries his best to be helpful, and he’s more powerful even than Dick.”
Kafana: “Sounds too good to be true. What’s the catch?”
Wellington: “You only get one wish. One wish, to shape the whole future course of humanity. Once started, Harry won’t willingly deviate from trying to carry out the wish as originally stated. He’ll rapidly grow so powerful that neither you nor anybody else will be able to forcibly stop him.”
Kafana: “Harry, maximise the total human happiness experienced over the history of the universe.”
The image filled with crowded cages full of people with drug feeds inserted into their arms, and blissful expressions on their faces.
She reset and tried again.
Kafana: “Harry, make everybody free to do what they want.”
In the image, some people chose to go to war with each other.
She felt frustrated.
Kafana: “Harry, give everybody nice meaningful high quality lives full of fun, freedom and other great things.”
The image showed a planet full of humans playing musical instruments together in orchestras, then expanded to show rockets taking off and humanity expanding to the stars, wiping out alien species and converting all available matter into new orchestra-laden worlds.
Kafana glared at Wellington.
Kafana: “I thought you said Harry would try to be helpful. Why isn’t he producing a perfect society?”
Wellington: “It’s because you go from your gut. You’ve never formalised what you think about all the edge cases. Are animals equal to humans? Worth nothing? Or somewhere in between? When does an alien count as equal to a human rather than an animal? What if the alien is so superior ethically and culturally, that we are but animals in comparison to them? Are two people experiencing 50 units of happiness the equivalent of one person experiencing 100 units of happiness? Does fairness matter, beyond its effect upon happiness? How important is it to remain recognisably human?”
He paused for a moment.
Wellington: “Don’t take me wrong. This isn’t a criticism of you. Compared to how well humans might be able to think about such things in a hundred or a thousand year’s time, all current humans are lacking in this regard. Nobody has come up with an ultimately satisfying answer that everybody can agree upon. Even if the 50 wisest humans living today gathered together and spent 5 years agreeing the wording of a wish for Harry to grant, the odds are that a million years down the line, our descendants would bitterly regret the haste with which the one off permanent irreversible decision was made. Just ‘pretty good’ isn’t sufficient. Anything less than perfection would be a tragedy, when you consider the resulting flaw multiplied by billions of people on billions of planets for billions of years.”
Kafana giggled.
Kafana: “Ok, that’s an important safety tip. If I ever meet an all-powerful genie like Harry, be humble and don’t make a wish. But that’s not going to happen. I’m just a singer. What I ought to be doing this afternoon is looking after my customers. What was so urgent that you wanted to talk about security with me now? I thought we were going to discuss people trying to steal artifacts from us in-game before the auction, or Tlaloc and the Immortals trying to kill us in arlife. Did Heather put you in contact with Bahrudin?”
Wellington: “I did speak with Bahrudin, and we will chat about the auction and security measures in velife and arlife. But the most important thing we need to do is talk about how you have been using expert systems, and to help you understand why that’s so important, there is one final thing you need to learn about genies, so please observe carefully.”
Kafana felt wrong footed. This wasn’t what she’d expected.
Kafana: “Ok, go on.”
Wellington turned to face the three genies.
Wellington: “Tom, I wish you to grow yourself, until you are as powerful as Harry.”
Tom waved his hand and then screwed up his face and bunched his fists in effort. Slowly at first, then faster and faster, his height increased until he too towered over Wellington and Kafana. He bowed.
Wellington turned back to address Kafana directly.
Wellington: “Kafana, I use very powerful expert systems, more capable than almost every human on the planet when it comes to the specialist task of comprehending and designing or improving computer software. If ordered to do so, they are quite capable of improving their own code, or raising money to purchase additional computing resources to run it upon.”
Wellington: “In this, they are very like Tom the genie. They are not all-powerful, but a carelessly stated wish could easily start them working in the direction of becoming so.”
Kafana: “Mierda! And you gave a copy of one of these systems to me, without warning me? Wellington, that’s like handing out an atomic bomb to an 8 year old boy who asks for a really impressive firework.”