The Best Lay Argument is not a Simple English Yud Essay

J Bostock

Epistemic status: these are my own opinions on AI risk communication, based primarily on my own instincts on the subject and discussions with people less involved with rationality than myself. Communication is highly subjective and I have not rigorously A/B tested messaging. I am even less confident in the quality of my responses than in the correctness of my critique.

If they turn out to be true, these thoughts can probably be applied to all sorts of communication beyond AI risk.

Lots of work has gone into trying to explain AI risk to laypersons. Overall, I think it's been great, but there's a particular trap that I've seen people fall into a few times. I'd summarize it as simplifying and shortening the text of an argument without enough thought for the information content. It comes in three forms. One is forgetting to adapt concepts for someone with a far inferential distance; another is forgetting to filter for the important information; the third is rewording an argument so much you fail to sound like a human being at all.

I'm going to critique three examples which I think typify these:

Failure to Adapt Concepts

I got this from the summaries of AI risk arguments written by Katja Grace and Nathan Young here. I'm making the assumption that these summaries are supposed to be accessible to laypersons, since most of them seem written that way. This one stands out as not having been optimized on the concept level. This argument was below-aveage effectiveness when tested.

I expect most people's reaction to point 2 would be "I understand all those words individually, but not together". It's a huge dump of conceptual information all at once which successfully points to the concept in the mind of someone who already understands it, but is unlikely to introduce that concept to someone's mind.

Here's an attempt to do better:

So far, humans have mostly developed technology by understanding the systems which the technology depends on.
AI systems developed today are instead created by machine learning. This means that the computer learns to produce certain desired outputs, but humans do not tell the system how it should produce the outputs. We often have no idea how or why an AI behaves in the way that it does.
Since we don't understand how or why an AI works a certain way, it could easily behave in unpredictable and unwanted ways.
If the AI is powerful, then the consequences of unwanted behaviour could be catastrophic.

And here's Claude's just for fun:

Up until now, humans have created new technologies by understanding how they work.
The AI systems made in 2024 are different. Instead of being carefully built piece by piece, they're created by repeatedly tweaking random systems until they do what we want. This means the people who make these AIs don't fully understand how they work on the inside.
When we use systems that we don't fully understand, we're more likely to run into unexpected problems or side effects.
If these not-fully-understood AI systems become very powerful, any unexpected problems could potentially be really big and harmful.

I think it gets points 1 and 3 better than me, but 2 and 4 worse. Either way, I think we can improve upon the summary.

Failure to Filter Information

When you condense an argument down, you make it shorter. This is obvious. What is not always as obvious is that this means you have to throw out information to make the core point clearer. Sometimes the information that gets kept is distracting. Here's an example from a poster a friend of mine made for Pause AI:

A poster explaining how Narrow AI learns to play chess by playing chess games, but AGI invents a chess AI and uses it as a tool

When I showed this to my partner, they said "This is very confusing, it makes it look like an AGI is an AI which makes a chess AI". Making more AIs is part of what AGI could do, but it's not really the central difference between narrow AI and AGI. The core property of an AGI is being capable at lots of different tasks.

Let's try and do better, though this is difficult to explain:

Examples of narrow AIs: a chess AI, a chemistry AI, and a coding AI, captioned "Narrow AI learns to do a specific task by being trained on that task, such as playing chess or writing computer code. Narrow AI has a limited scope, so the overall risks are limited. An example of AGI doing several tasks, captioned: AGI is trained on diverse data and learns to do many differen tasks. It could plan and reason, even make more AIs. This means the risks from AGI are much larger than from narrow AI.

This one is not my best work, especially on the artistic front. It's a difficult concept to communicate! But I think this fixes the specific issue of information filtering. Narrow AI's do a single, bounded task; AGI can do a broad range of tasks.

Failure to Sound Like a Human Being

In this case, the writing is so compressed and removed from the original (complicated) concept that it breaks down and needs to be rewritten from the ground up. Here's a quick example from the same page (sorry Katja and Nathan! You're just the easiest example arguments to find, I really really do love the work you're doing). This is from the "Second Species Argument" which was of middling effectiveness, though this is a secondary example and not the core argument.

This is just ... an odd set of sentences. We get both of the previous errors for free here too. "An orangutan uses a stick to control juice" is poor information filtering: why does it matter that an orangutan can use a tool? "Should orangutans have felt save inventing humans" is an unnecessarily abstract question, why not just ask whether orangutans have benefited from the existence of humans or not.

But moreover, the whole thing is one of the strangest passages of text I've ever read! "An orangutan uses a stick to control juice, while humans ... control the orangutan" is a really abstract and uncommon use of the word "control" which makes no sense outside of deep rationalist circles, and also sounds like it was written by aliens. Here's my attempt to do better:

Chimpanzee - San Francisco Zoo & Gardens — Chimpanzees are physically stronger and more agile than humans, but because we're more intelligent, we're more powerful. We can destroy their habitats or put them in zoos. Are chimps better off because a more intelligent species than them exists?

For a start, I'd use a chimp instead of an orangutan, because they're a less weird animal and a closer relative to humans, which better makes our point. I then explain that we're dominant over chimps due to our intelligence, and give examples. Then instead of asking "should chimps have invented humans" I ask "Are chimps better off because a more intelligent species than them exists?" which doesn't introduce a weird hypothetical surrounding chimps inventing humans.

Summary

It's tempting to take the full, complicated knowledge structure you (i.e. a person in the 99.99th percentile of time spent thinking about a topic) want to express, and try and map it one-to-one onto a layperson's epistemology. Unfortunately, this isn't generally possible to do when your map of (this part of) the world has more moving parts than theirs. Often, you'll have to first convert your fine-grained models to coarse-grained ones, and filter out extraneous information before you can map the resulting simplified model onto their worldview.

A diagram attempting to explain the above paragraph visually. No novel information here for screen-reader users. — On the off chance that this diagram helps, I might as well put it in.

One trick I use is to imagine the response someone would give if I succeeded in explaining the concepts them, and then I asked them to summarize what they've learned back to me. I'm pretending to be my target audience who is passing an ideological turing test of my own views. "What would they say that would convince me they understood me?". Mileage may vary.

"Should orangutans have felt save inventing humans" is an unnecessarily abstract question, why not just ask whether orangutans have benefited from the existence of humans or not.

I'm not sure I can model what a lay person would think, but, fwiw I think the "should orangatans have invented humans?" much more direct as an intuition pump here. Yes it's a bit abstract, but, it more directly prompts me to think "we may be inventing AI that is powerful relative to us the way we're powerful relative to chimps."

It probably depends on whom are communicating to. I guess there are people not used to using such analogies or thought experiments, and would immediately think: "This is a silly question, orangutans cannot invent humans!", and the same people would still think about the question in the way you intend if you break it down into several steps.

I actually agree with the normal person here, though I'd rephrase it to "This is a silly question, orangutans did not invent humans!", primarily because of the many disanalogies between the evolution of some chimps/gorillas/orangutans into humans, and the ways AI companies train/invent their AIs, and I'd have a similar reaction if chimpanzees or gorillas were used as the examples.

The evolution of humans from chimpanzees, gorillas and orangutans provides ~0 bits of evidence for AI outcomes, and whatever happens on AI, it will be for very different reasons than the second species argument gives us.

Great post!

AI systems developed today are instead created by machine learning. This means that the computer learns to produce certain desired outputs, but humans do not tell the system how it should produce the outputs. We often have no idea how or why an AI behaves in the way that it does. [...]

The AI systems made in 2024 are different. Instead of being carefully built piece by piece, they're created by repeatedly tweaking random systems until they do what we want. This means the people who make these AIs don't fully understand how they work on the inside.

I think Claude's version of this point is better, mainly due to it not using the word "output"; that's a programming/computer science term that I expect the average person to not understand (at least not in this kind of a context). "Do what we want" is much clearer.

"Output" is a very common word in non-programming contexts, and I think only programmers will associate it with computer science. (My first thought is outputs of a production process.) It's a very simple Anglish combination (out + put → "thing that is put out").

More importantly, I think this is missing a big part of this post's point, which is that how hard it is to understand a text has very little to do with how difficult the individual words are. To quote Randall Munroe:

"I've noticed you physics people can be a little on the reductionist side."
"That's ridiculous. Name ONE reductionist word I've ever said."

You can produce a 5-word passage that only uses words from the 850 BASIC English list, but is still hard to understand. Example: "the old man the boat".

I like what you’re doing trying to do here. I think this is important work.

I’m a bit confused at what you mean by Layperson though? These are good for the ‘every day’ above average intelligence ‘switched on’ type of individual.

But that is not what I image a Layperson as. I interact regularly with ~100 people. (For context, I am a Drama Teacher and Trivia Host)

I thought about how many I predict could understand these examples, given 20 seconds of their attention. I thought of 10 people. The other 90% would fall into a few other categories that all end with them not being more knowledgeable after coming across the text.

But am I confused? Was that 90% not the target audience?

Yeah, I agree we need improvement. I don't know how many people it's important to reach, but I am willing to believe you that this will hit maybe 10%. I expect the 10% to be people with above-average impact on the future, but I don't know what %age of people is enough.

90% is an extremely ambitious goal. I would be surprised if 90% of the population can be reliably convinced by logical arguments in general.

Yep! If I think about those 10 people, 5 are having, or I expect to have large impact on the future. As for ages, all the people I thought of except one were over 20. There was one 14yo who is just naturally super high G.

As the author of example 2, this is very helpful!

(I'm reviewing my own post, which LessWrong allows me to do and I am therefore assuming is OK under the doctrine of Code Is Law)

I'm still very pleased with this post. Having spent an additional year in AI risk comms, I stand by the points I made. I think the bar for AI risk comms is much higher now than it was when I wrote this post, though it could still be higher, and I don't expect my Shapely value is particularly high on this front: lots of people have worked at this!

I'm not the best person to review this, given that it is me giving advice; ideally someone other than me would say whether or not they'd used my advice and it had helped them! But I do have the next-next-best thing, which is a comment on the EA forum crosspost of someone saying they were using my advice (though nobody has given me any information on whether it worked).

I think there are a couple more failure modes I might write about, which I might write into a new post:

One is forgetting that the audience are not rationalists. There's a couple of weird lines which stuck out to me in IABIED which are absolutely references to previous Eliezer writings and/or quirks of Eliezer's thinking, and which don't need to be there. IABIED is in many ways an attempt at something better than a simple English Yud essay.
Another is politics speak when normal speak would do. Using politics speak---something I did myself when I emailed my MP once, saying "The UK must lead on AI regulation" actively makes your argument weaker in some cases: it's like showing up to a coding interview wearing a suit and tie, it signals you're not confident in your argument on its own merits.

Self reviews are actively encouraged! Indeed, we will ask authors to do so explicitly in the review phase.

Another nice example of "sound[ing] like a human being" is Stuart Russell's explanation of "the gorilla problem" in the book Human Compatible. Quoting directly from the start of chapter 5:

It doesn't require much imagination to see that making something smarter than yourself could be a bad idea. We understand that our control over the environment and over other species is a result of our intelligence, so the thought of something else being more intelligent than us---whether it's a robot or an alien---immediately induces a queasy feeling.

Around ten million years ago, the ancestors of the modern gorilla created (accidentally, to be sure) the genetic lineage leading to modern humans. How do the gorillas feel about this? Clearly, if they were able to tell us about their species' current situation vis-à-vis humans, the consensus opinion would be very negative indeed. Their species has essentially no future beyond that which we deign to allow. We do not want to be in a similar situation vis-à-vis superintelligent machines. I'll call this the gorilla problem---specifically, the problem of whether humans can maintain their supremacy and autonomy in a world that includes machines with substantially greater intelligence.

I wrote a two paragraph argument for AI risk a while back. Does it work?

I think there is an easier way to get the point across by focusing not on self-improving AI, which is hard to understand, but on something everyone already understands: AI will make it easier for rich people to exploit everyone else. Right now, dictators still have to spend effort on keeping their subordinates happy or else they will be overthrown. And those subordinates have to spend effort on keeping their own subordinates from rebelling, too. That way you get at least a small incentive to keep other people happy.

Once a dictator has an AI servant, all of that falls away. Everything becomes automated, and there is no longer any check on the dictator's ruthlessness and evil at all.

Realistically, the self-improving AI will depose the dictator and then do who knows what. But do we actually need to convince people of that, given that it's a hard sell? If people become convinced "Uncontrolled AI research leads to dictatorship", won't that have all the policy effects we need?

The key is not to hyper-optimize specific messages, but rather to develop dialogues with the "spirits" or perspectives that tons of people share.

They don't have to understand on the first try and we don't need to fit all the essentials into the perfect message. We just need to keep going back and forth, and adjusting what we say based on the Other's response, and try to course-correct until we hit the target.

This is also the way because we need action, not just understanding. What good is it to convince people if they then end up paralyzed with akrasia or otherwise unhelpful or anti-helpful?

With the dialogical approach, if we ever do succeed at arriving at sufficient understanding, we'll get a fair amount of action thrown in for free!

Thank you for this great post. I think https://forum.effectivealtruism.org/ could benefit from this as well!

I've posted it there. Had to use a linkpost because I didn't have an existing account there and you can't crosspost without 100 karma (presumably to prevent spam) and you can't funge LW karma for EAF karma.

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?