Review
meta: it seems like the collapse feature doesn't work on mobile, and the table is hard to read (especially the first column)
it's more that the collapse feature doesn't work on LessWrong (it's from Holden's blog, which this is crossposted from)
I do think it'd be a good thing to build
Oooh, I would love to have Workflowy-style collapsible toggles on LW. That's a way to write essays which expand on small details, or which meander somewhat, without inconveniencing readers who just want to see the important points. (Side note: Notion furthermore has collapsible headings, which are also great.)
In the most important century series, I argued that the 21st century could be the most important century ever for humanity, via the development of advanced AI systems that could dramatically speed up scientific and technological advancement, getting us more quickly than most people imagine to a deeply unfamiliar future.
In this more recent series, I’ve been trying to help answer this question: “So what? What can I do to help?”
So far, I’ve just been trying to build a picture of some of the major risks we might face (especially the risk of misaligned AI that could defeat all of humanity), what might be challenging about these risks, and why we might succeed anyway. Now I’ve finally gotten to the part where I can start laying out tangible ideas for how to help (beyond the pretty lame suggestions I gave before).
This piece is about one broad way to help: spreading messages that ought to be more widely understood.
One reason I think this topic is worth a whole piece is that practically everyone can help with spreading messages at least some, via things like talking to friends; writing explanations of your own that will appeal to particular people; and, yes, posting to Facebook and Twitter and all of that. Call it slacktivism if you want, but I’d guess it can be a big deal: many extremely important AI-related ideas are understood by vanishingly small numbers of people, and a bit more awareness could snowball. Especially because these topics often feel too “weird” for people to feel comfortable talking about them! Engaging in credible, reasonable ways could contribute to an overall background sense that it’s OK to take these ideas seriously.
And then there are a lot of potential readers who might have special opportunities to spread messages. Maybe they are professional communicators (journalists, bloggers, TV writers, novelists, TikTokers, etc.), maybe they’re non-professionals who still have sizable audiences (e.g., on Twitter), maybe they have unusual personal and professional networks, etc. Overall, the more you feel you are good at communicating with some important audience (even a small one), the more this post is for you.
That said, I’m not excited about blasting around hyper-simplified messages. As I hope this series has shown, the challenges that could lie ahead of us are complex and daunting, and shouting stuff like “AI is the biggest deal ever!” or “AI development should be illegal!” could do more harm than good (if only by associating important ideas with being annoying). Relatedly, I think it’s generally not good enough to spread the most broad/relatable/easy-to-agree-to version of each key idea, like “AI systems could harm society.” Some of the unintuitive details are crucial.
Instead, the gauntlet I’m throwing is: “find ways to help people understand the core parts of the challenges we might face, in as much detail as is feasible.” That is: the goal is to try to help people get to the point where they could maintain a reasonable position in a detailed back-and-forth, not just to get them to repeat a few words or nod along to a high-level take like “AI safety is important.” This is a lot harder than shouting “AI is the biggest deal ever!”, but I think it’s worth it, so I’m encouraging people to rise to the challenge and stretch their communication skills.
Below, I will:
Challenges of AI-related messages
Here’s a simplified story for how spreading messages could go badly.
(Click to expand) More on the “competition” frame vs. the “caution” frame”
In a previous piece, I talked about two contrasting frames for how to make the best of the most important century:
The caution frame. This frame emphasizes that a furious race to develop powerful AI could end up making everyone worse off. This could be via: (a) AI forming dangerous goals of its own and defeating humanity entirely; (b) humans racing to gain power and resources and “lock in” their values.
Ideally, everyone with the potential to build something powerful enough AI would be able to pour energy into building something safe (not misaligned), and carefully planning out (and negotiating with others on) how to roll it out, without a rush or a race. With this in mind, perhaps we should be doing things like:
The “competition” frame. This frame focuses less on how the transition to a radically different future happens, and more on who's making the key decisions as it happens.
This means it could matter enormously "who leads the way on transformative AI" - which country or countries, which people or organizations.
Some people feel that we can make confident statements today about which specific countries, and/or which people and organizations, we should hope lead the way on transformative AI. These people might advocate for actions like:
Tension between the two frames. People who take the "caution" frame and people who take the "competition" frame often favor very different, even contradictory actions. Actions that look important to people in one frame often look actively harmful to people in the other.
For example, people in the "competition" frame often favor moving forward as fast as possible on developing more powerful AI systems; for people in the "caution" frame, haste is one of the main things to avoid. People in the "competition" frame often favor adversarial foreign relations, while people in the "caution" frame often want foreign relations to be more cooperative.
That said, this dichotomy is a simplification. Many people - including myself - resonate with both frames. But I have a general fear that the “competition” frame is going to be overrated by default for a number of reasons, as I discuss here.
Unfortunately, I’ve seen something like the above story play out in multiple significant instances (though I shouldn’t give specific examples).
And I’m especially worried about this dynamic when it comes to people in and around governments (especially in national security communities), because I perceive governmental culture as particularly obsessed with staying ahead of other countries (“If AI is dangerous, we’ve gotta build it first”) and comparatively uninterested in things that are dangerous for our country because they’re dangerous for the whole world at once (“Maybe we should worry a lot about pandemics?”)1
You could even argue (although I wouldn’t agree!2) that to date, efforts to “raise awareness” about the dangers of AI have done more harm than good (via causing increased investment in AI, generally).
So it’s tempting to simply give up on the whole endeavor - to stay away from message spreading entirely, beyond people you know well and/or are pretty sure will internalize the important details. But I think we can do better.
This post is aimed at people who are good at communicating with at least some audience. This could be because of their skills, or their relationships, or some combination. In general, I’d expect to have more success with people who hear from you a lot (because they’re your friend, or they follow you on Twitter or Substack, etc.) than with people you reach via some viral blast of memery - but maybe you’re skilled enough to make the latter work too, which would be awesome. I'm asking communicators to hit a high bar: leave people with strong understanding, rather than just getting them to repeat a few sentences about AI risk.
Messages that seem risky to spread in isolation
First, here are a couple of messages that I’d rather people didn’t spread (or at least have mixed feelings about spreading) in isolation, i.e., without serious efforts to include some of the other messages I cover below.
One category is messages that generically emphasize the importance and potential imminence of powerful AI systems. The reason for this is in the previous section: many people seem to react to these ideas (especially when unaccompanied by some other key ones) with a “We’d better build powerful AI as fast as possible, before others do” attitude. (If you’re curious about why I wrote The Most Important Century anyway, see footnote for my thinking.3)
Another category is messages that emphasize that AI could be risky/dangerous to the world, without much effort to fill in how, or with an emphasis on easy-to-understand risks.
Messages that seem important and helpful (and right!)
We should worry about conflict between misaligned AI and all humans
Unlike the messages discussed in the previous section, this one directly highlights why it might not be a good idea to rush forward with building AI oneself.
The idea that an AI could harm the same humans who build it has very different implications from the idea that AI could be generically dangerous/powerful. Less “We’d better get there before others,” more “there’s a case for moving slowly and working together here.”
The idea that AI could be a problem for the same people who build it is common in fictional portrayals of AI (HAL 9000, Skynet, The Matrix, Ex Machina) - maybe too much so? It seems to me that people tend to balk at the “sci-fi” feel, and what’s needed is more recognition that this is a serious, real-world concern.
The main pieces in this series making this case are Why would AI “aim” to defeat humanity? and AI could defeat all of us combined. There are many other pieces on the alignment problem (see list here); also see Matt Yglesias's case for specifically embracing the “Terminator”/Skynet analogy.
I’d be especially excited for people to spread messages that help others understand - at a mechanistic level - how and why AI systems could end up with dangerous goals of their own, deceptive behavior, etc. I worry that by default, the concern sounds like lazy anthropomorphism (thinking of AIs just like humans).
Transmitting ideas about the “how and why” is a lot harder than getting people to nod along to “AI could be dangerous.” I think there’s a lot of effort that could be put into simple, understandable yet relatable metaphors/analogies/examples (my pieces make some effort in this direction, but there’s tons of room for more).
AIs could behave deceptively, so “evidence of safety” might be misleading
I’m very worried about a sequence of events like:
I worry about AI systems’ being deceptive in the same way a human might: going through chains of reasoning like “If I do X, I might get caught, but if I do Y, no one will notice until it’s too late.” But it can be hard to get this concern taken seriously, because it means attributing behavior to AI systems that we currently associate exclusively with humans (today’s AI systems don’t really do things like this4).
One of the central things I’ve tried to spell out in this series is why an AI system might engage in this sort of systematic deception, despite being very unlike humans (and not necessarily having e.g. emotions). It’s a major focus of both of these pieces from this series:
Whether this point is widely understood seems quite crucial to me. We might end up in a situation where (a) there are big commercial and military incentives to rush ahead with AI development; (b) we have what seems like a set of reassuring experiments and observations.
At that point, it could be key whether people are asking tough questions about the many ways in which “evidence of AI safety” could be misleading, which I discussed at length in AI Safety Seems Hard to Measure.
(Click to expand) Why AI safety could be hard to measure
In previous pieces, I argued that:
When dealing with an intelligent agent, it’s hard to tell the difference between “behaving well” and “appearing to behave well.”
When professional cycling was cracking down on performance-enhancing drugs, Lance Armstrong was very successful and seemed to be unusually “clean.” It later came out that he had been using drugs with an unusually sophisticated operation for concealing them.
The AI is (actually) well-behaved when humans are in control. Will this transfer to when AIs are in control?
It's hard to know how someone will behave when they have power over you, based only on observing how they behave when they don't.
AIs might behave as intended as long as humans are in control - but at some future point, AI systems might be capable and widespread enough to have opportunities to take control of the world entirely. It's hard to know whether they'll take these opportunities, and we can't exactly run a clean test of the situation.
Like King Lear trying to decide how much power to give each of his daughters before abdicating the throne.
Today's AI systems aren't advanced enough to exhibit the basic behaviors we want to study, such as deceiving and manipulating humans.
Like trying to study medicine in humans by experimenting only on lab mice.
Imagine that tomorrow's "human-like" AIs are safe. How will things go when AIs have capabilities far beyond humans'?
AI systems might (collectively) become vastly more capable than humans, and it's ... just really hard to have any idea what that's going to be like. As far as we know, there has never before been anything in the galaxy that's vastly more capable than humans in the relevant ways! No matter what we come up with to solve the first three problems, we can't be too confident that it'll keep working if AI advances (or just proliferates) a lot more.
Like trying to plan for first contact with extraterrestrials (this barely feels like an analogy).
An analogy that incorporates these challenges is Ajeya Cotra’s “young businessperson” analogy:
If your applicants are a mix of "saints" (people who genuinely want to help), "sycophants" (people who just want to make you happy in the short run, even when this is to your long-term detriment) and "schemers" (people who want to siphon off your wealth and power for themselves), how do you - an eight-year-old - tell the difference?
More: AI safety seems hard to measure
AI projects should establish and demonstrate safety (and potentially comply with safety standards) before deploying powerful systems
I’ve written about the benefits we might get from “safety standards." The idea is that AI projects should not deploy systems that pose too much risk to the world, as evaluated by a systematic evaluation regime: AI systems could be audited to see whether they are safe. I've outlined how AI projects might self-regulate by publicly committing to having their systems audited (and not deploying dangerous ones), and how governments could enforce safety standards both nationally and internationally.
Today, development of safety standards is in its infancy. But over time, I think it could matter a lot how much pressure AI projects are under to meet safety standards. And I think it’s not too early, today, to start spreading the message that AI projects shouldn’t unilaterally decide to put potentially dangerous systems out in the world; the burden should be on them to demonstrate and establish safety before doing so.
(Click to expand) How standards might be established and become national or international
I previously laid out a possible vision on this front, which I’ll give a slightly modified version of here:
Alignment research is prosocial and great
Most people reading this can’t go and become groundbreaking researchers on AI alignment. But they can contribute to a general sense that the people who can do this (mostly) should.
Today, my sense is that most “science” jobs are pretty prestigious, and seen as good for society. I have pretty mixed feelings about this:
I wish there were more effort, generally, to distinguish between especially dangerous science and especially beneficial science. AI alignment seems squarely in the latter category.
I’d be especially excited for people to spread messages that give a sense of the specifics of different AI alignment research paths, how they might help or fail, and what’s scientifically/intellectually interesting (not just useful) about them.
The main relevant piece in this series is High-level hopes for AI alignment, which distills a longer piece (How might we align transformative AI if it’s developed very soon?) that I posted on the Alignment Forum.
There are a number (hopefully growing) of other careers that I consider especially valuable, which I'll discuss in my next post on this topic.
It might be important for companies (and other institutions) to act in unusual ways
In Racing through a Minefield: the AI Deployment Problem, I wrote:
It always makes me sweat when I’m talking to someone from an AI company and they seem to think that commercial success and benefiting humanity are roughly the same goal/idea.(To be clear, I don't think an AI project's only goal should be to avoid the risk of misaligned AI. I've given this risk a central place in this piece partly because I think it's especially at risk of being too quickly dismissed - but I don't think it's the only major risk. I think AI projects need to strike a tricky balance between the caution and competition frames, and consider a number of issues beyond the risk of misalignment. But I think it's a pretty robust point that they need to be ready to do unusual things rather than just following commercial incentives.)
I’m nervous about a world in which:
At a minimum (as I argued previously), I think AI companies should be making sure they have whatever unusual governance setups they need in order to prioritize benefits to humanity - not returns to shareholders - when the stakes get high. I think we’d see more of this if more people believed something like: “It might be important for companies (and other institutions) to act in unusual ways.”
We’re not ready for this
If we’re in the most important century, there’s likely to be a vast set of potential challenges ahead of us, most of which have gotten very little attention. (More here: Transformative AI issues (not just misalignment): an overview)
If it were possible to slow everything down, by default I’d think we should. Barring that, I’d at least like to see people generally approaching the topic of AI with a general attitude along the lines of “We’re dealing with something really big here, and we should be trying really hard to be careful and humble and thoughtful” (as opposed to something like “The science is so interesting, let’s go for it” or “This is awesome, we’re gonna get rich” or “Whatever, who cares”).
I’ll re-excerpt this table from an earlier piece:
I’m not at all sure about this, but one potential way to spread this message might be to communicate, with as much scientific realism, detail and believability as possible, about what the world might look like after explosive scientific and technological advancement brought on by AI (for example, a world with digital people). I think the enormous unfamiliarity of some of the issues such a world might face - and the vast possibilities for utopia or dystopia - might encourage an attitude of not wanting to rush forward.
How to spread messages like these?
I’ve tried to write a series that explains the key issues to careful readers, hopefully better equipping them to spread helpful messages. From here, individual communicators need to think about the audiences they know and the mediums they use (Twitter? Facebook? Essays/newsletters/blog posts? Video? In-person conversation?) and what will be effective with those audiences and mediums.
The main guidelines I want to advocate:
Footnotes
Killer Apps and Technology Roulette are interesting pieces trying to sell policymakers on the idea that “superiority is not synonymous with security.” ↩
When I imagine what the world would look like without any of the efforts to “raise awareness,” I picture a world with close to zero awareness of - or community around - major risks from transformative AI. While this world might also have more time left before dangerous AI is developed, on balance this seems worse. A future piece will elaborate on the many ways I think a decent-sized community can help reduce risks. ↩
I do think “AI could be a huge deal, and soon” is a very important point that somewhat serves as a prerequisite for understanding this topic and doing helpful work on it, and I wanted to make this idea more understandable and credible to a number of people - as well as to create more opportunities to get critical feedback and learn what I was getting wrong.
But I was nervous about the issues noted in this section. With that in mind, I did the following things:
I don’t claim to be sure I got all the tradeoffs right. ↩
There are some papers arguing that AI systems do things something like this (e.g., see the “Challenges” section of this post), but I think the dynamic is overall pretty far from what I’m most worried about. ↩
E.g., public benefit corporation ↩