A plea for solutionism on AI safety

jasoncrawford

[Note for LW: This essay was written mostly for people who are inclined to downplay or dismiss AI risk, which means it has almost no intersection with this audience. Cross-posting it here anyway for feedback and in case you want to point anyone to it.]

Will AI kill us all?

This question has rapidly gone mainstream. A few months ago, it wasn’t seriously debated very far outside the rationalist community of LessWrong; now it’s reported in major media outlets including the NY Times, The Guardian, the Times of London, BBC, WIRED, Time, Fortune, U.S. News, and CNBC.

For years, the rationalists lamented that the world was neglecting the existential risk from AI, and despaired of ever convincing the mainstream of the danger. But it turns out, of course, that our culture is fully prepared to believe that technology can be dangerous. The reason AI fears didn’t go mainstream earlier wasn’t society’s optimism, but its pessimism: most people didn’t believe AI would actually work. Once there was a working demo that got sufficient publicity, it took virtually no extra convincing to get people to be worried about it.

As usual, the AI safety issue is splitting people into two camps. One is pessimistic, often to the point of fatalism or defeatism: emphasizing dangers, ignoring or downplaying benefits, calling for progress to slow or stop, and demanding regulation. The other is optimistic, often to the point of complacency: dismissing the risks, and downplaying the need for safety.

If you’re in favor of technology and progress, it is natural to react to fears of AI doom with worry, anger, or disgust. It smacks of techno-pessimism, and it could easily lead to draconian regulations that kill this technology or drastically slow it down, depriving us all of its potentially massive benefits. And so it is tempting to line up with the techno-optimists, and to focus primarily on arguing against the predictions of doom. If you feel that way, this essay is for you.

I am making a plea for solutionism on AI safety. The best path forward, both for humanity and for the political battle, is to acknowledge the risks, help to identify them, and come up with a plan to solve them. How do we develop safe AI? And how do we develop AI safely?

Let me explain why I think this makes sense even for those of us who strongly believe in progress, and secondarily why I think it’s needed in the current political environment.

Safety is a part of progress

Humanity inherited a dangerous world. We have never known safety: fire, flood, plague, famine, wind and storm, war and violence, and the like have always been with us. Mortality rates are high as far back as we can measure them. Not only was death common, it was sudden and unpredictable. A shipwreck, a bout of malaria, or a mining accident could kill you quickly, at any age.

Over the last few centuries, technology has helped make our lives more comfortable and safer. But it also created new risks: boiler explosions, factory accidents, car and plane crashes, toxic chemicals, radiation.

When we think of the history of progress and the benefits it has brought, we should think not only of wealth measured in economic production. We should think also of the increase in health and safety.

Safety is an achievement. It is an accomplishment of progress—a triumph of reason, science, and institutions. Like the other accomplishments of progress, we should be proud of it—and we should be unsatisfied if we stall out at our current level. We should be restlessly striving for more. A world in which we continue to make progress should be not only a wealthier world, but a safer world.

We should (continue to) get more proactive about safety

Long ago, in a more laissez-faire world, technology came first and safety came later. The first automobiles didn’t have seat belts, or even turn signals. X-ray machines were used without shielding, and many X-ray technicians had to have hands amputated from radiation damage. Drugs were released on the market without testing and without quality control.

Safety was achieved in all those areas empirically, by learning from experience. When disasters happened, we would identify the root causes and implement solutions. This was a reliable path to safety. The only problem is that people had to die before safety measures were put in place.

So over time, especially in the 20th century, people called for more care to be taken up front. New drugs and consumer products are tested before going on the market. Buildings must meet code, and restaurants pass health inspection, before opening to the public.

Today we are much more cautious about introducing new technology. Consider how much safety testing has been done around self-driving cars, vs. how little testing was done on the first cars. Consider how much testing the first genetic therapies had, vs. the early pharmaceutical industry.

AI is an instance of this. We are still at the stage of chatbots and image generators, and yet already people are thinking ahead to, and even testing for, a wide range of possible harms.

In part, this reflects the achievement of safety itself: because the world we live in is so much safer, life has become more precious. When you could, any day, come down with cholera, or be caught in a mine collapse, or break your neck falling off a horse, people just didn’t worry as much about risks. We have reduced the background risk so low, that people now demand that new technology start with that same high level of safety. This is rational.

Leave the argument behind

The rise of safety culture has not been entirely healthy.

Concern for safety has become an obsession. In the name of safety, we have stunted nuclear power, delayed lifesaving medical treatments, killed many useful clinical trials, and made it more difficult to have children, to name just a few examples.

Worse, safety is a favorite weapon of anyone who opposes any new technology. Such opposition tends to attract a “bootleggers and Baptists” coalition, as those who are sincerely concerned about safety are joined by those who cynically seek to protect their own interests by preventing competition.

This is not a unique feature of our modern, extremely safety-conscious world. It has always been this way. Even in 1820s Britain—which was so pro-progress that they built a statue to James Watt for “bestowing almost immeasurable benefits on the whole human race”—proposals for railroad transportation, for example, met with enormous opposition. One commenter thought that even eighteen to twenty miles per hour was far too fast, suggesting that people would rather strap themselves to a piece of rocket artillery than trust themselves to a locomotive going at such speeds. In a line that could have come from Ralph Nader, he expressed the hope that Parliament would limit speed to eight or nine miles an hour. This was before any passenger locomotive service had been established, anywhere.

So I understand why many who are in favor of technology, growth, and progress see safety as the enemy. But this is wrong. I see the trend towards greater safety, including safety work before new technologies are introduced, as something fundamentally good. We should reform our safety culture, but not abolish it.

I strongly encourage you, before you decide what you think on this issue or what you want to say about it publicly, to first think about what your position would be if there were no big political controversy about it. Try to leave the argument behind, drop any defensive posture, and think through the issue unencumbered.

I think if you do that, you will realize that of course there are risks to AI, as there are to almost any new technology (although in my opinion the biggest risks, and the most important safety measures, are some of the least discussed). You don’t even need to assume that AI will develop misaligned goals to see this: just imagine AI controlling cars, planes, cargo ships, power plants, factories, and financial markets, and you can see that even simple bugs in the software could create disasters.

We shouldn’t be against AI safety—done right—any more than we are against seat belts, fire alarms, or drug trials. And just as inventing seat belts was a part of progress in automobile technology, and developing the method of clinical trials was a part of progress in medicine, so designing and building appropriate AI safety mechanisms will be a part of progress in AI.

Safety is the only politically viable path anyway

I suggested that you “leave the argument behind” in order to think through your own position—but once you do that, you need to return to the argument, because it matters. The political context gives us an important secondary reason to focus on safety: it is the only politically viable path.

First, since risks do exist, we need to acknowledge them for the sake of credibility, especially in today’s safety-conscious society. If we dismiss them, most people won’t take us seriously.

Second, given that people are already worried, they are looking for solutions. If no one offers a solution that allows AI development to continue, then some other program will be adopted. Rather than try to convince people not to worry and not to act, it is better to suggest a reasonable course of action that they can follow.

Already the EU, true to form, is being fairly heavy-handed, while the UK has explicitly decided on a pro-innovation approach. But what will matter most is the US, which doesn’t know what it’s doing yet. Which way will this go? Will we get a new regulatory agency that must review all AI systems, demanding proof of safety before approval? (Microsoft has already called for “a new government agency,” and OpenAI has proposed “an international authority.”) This could end up like nuclear, where nothing gets approved and progress stalls for decades. Or, will we get an approach like the DOT’s plan for self-driving cars? In that field, R&D has moved forward and technology is being cautiously, incrementally, and safely deployed.

Summary: solutionism on safety

Instead of framing safety debates as optimism vs. pessimism, we should take a solutionist approach to safety—including for emerging technologies, and especially for AI.

In contrast to complacent optimism, we should openly acknowledge risks. Indeed, we should eagerly identify them and think them through, in order to be best prepared for them. But in contrast to defeatist pessimism, we should do this not in order to slow or stop progress, but to identify positive steps we can take towards safer technology.

Technologists should do this in part to get ahead of critics. They hurt their cause by being dismissive of risk: they lose credibility and reinforce the image of recklessness. But more importantly, they should do it because safety is part of progress.

One thing I'd like to see more of: attempts at voluntary compliance with proposed plans, and libraries and tools to support that.

I've seen suggestions to limit the compute power used on large training runs. Sounds great; might or might not be the answer, but if folks want to give it a try, let's help them. Where are the libraries that make it super easy to report the compute power used on a training run? To show a Merkle tree of what other models or input data that training run depends on? (Or, if extinction risk isn't your highest priority, to report which media by which people got incorporated, and what licenses it was used under?) How do those libraries support reporting by open-source efforts, and incremental reporting?

What if the plan is alarm bells and shutdowns of concerning training runs? Or you're worried about model exfiltration by spies or rogue employees? Are there tools that make it easy to report what steps you're taking to prevent that? That make it easy to provide good security against those threat models? Where's the best practices guide?

We don't have a complete answer. But we have some partial answers, or steps that might move in the right direction. And right now actually taking those next steps, for marginal people kinda on the fence about how to trade capabilities progress against security and alignment work, looks like it's hard. Or at least harder than I can imagine it being.

(On a related note, I think the intersection of security and alignment is a fruitful area to apply more effort.)

Do you disagree with Apollo or ARC evals's approaches to the voluntary compliance solutions?

I think neither. Or rather, I support it, but that's not quite what I had in mind with the above comment, unless there's specific stuff they're doing that I'm not aware of. (Which is entirely possible; I'm following this work only loosely, and not in detail. If I'm missing something, I would be very grateful for more specific links to stuff I should be reading. Git links to usable software packages would be great.)

What I'm looking for mostly, at the moment, is software tools that could be put to use. A library, a tutorial, a guide for how to incorporate that library into your training run, and a result of better compliance with voluntary reporting. What I've seen so far is mostly high-effort investigative reports and red-teaming efforts.

Best practices around how to evaluate models and high-effort things you can do while making them are also great. But I'm specifically looking for tools that enable low effort compliance and reporting options while people are doing the same stuff they otherwise would be. I think that would complement the suggestions for high-effort best practices.

The output I'd like to see is things like machine-parseable quantification of flops used to generate a model, such that a derivative model would specify both total and marginal flops used to create it.

I’m pretty confident the primary labs keep track of the number of flops used to train their models. I also don’t know how such a tool would prevent us all from dying.

I don't know how it prevents us from dying either! I don't have a plan that accomplishes that; I don't think anyone else does either. If I did, I promise I'd be trying to explain it.

That said, I think there are pieces of plans that might help buy time, or might combine with other pieces to do something more useful. For example, we could implement regulations that take effect above a certain model size or training effort. Or that prevent putting too many flops worth of compute in one tightly-coupled cluster.

One problem with implementing those regulations is that there's disagreement about whether they would help. But that's not the only problem. Other problems are things like: how hard would they be to comply with and audit compliance with? Is compliance even possible in an open-source setting? Will those open questions get used as excuses to oppose them by people who actually object for other reasons?

And then there's the policy question of how we move from the no-regulations world of today to a world with useful regulations, assuming that's a useful move. So the question I'm trying to attack is: what's the next step in that plan? Maybe we don't know because we don't know what the complete plan is or whether the later steps can work at all, but are there things that look likely to be useful next steps that we can implement today?

One set of answers to that starts with voluntary compliance. Signing an open letter creates common knowledge that people think there's a problem. Widespread voluntary compliance provides common knowledge that people agree on a next step. But before the former can happen, someone has to write the letter and circulate it and coordinate getting signatures. And before the latter can happen, someone has to write the tools.

So a solutionism-focused approach, as called for by the post I'm replying to, is to ask what the next step is. And when the answer isn't yet actionable, break that down further until it is. My suggestion was intended to be one small step of many, that I haven't seen discussed much as a useful next step.

There is a view of "pesimistic solutionism", and "optimistic solutionism".

In optimistic solutionism, you think that yes it's possible to make mistakes, but only if your really trying to screw up. Basically any attempt at safety is largely going to work. Safety is easy. The consequences of a few mistakes are mild so we can find out by trial and error.

In pesimistic solutionism you think doing it safely is theoretically possible, but really hard. The whole field is littered with subtle booby traps. The first 10 things you think of to try to make things safe don't work for complicated reasons. There is no simple safe way to test whether something is safe. The consequences of one mistake anywhere can doom all humanity.

With optimistic solutionism, it's a case of "go right ahead, oh and keep an eye on safety".

What about pesimistic solutionism? What should you do where safe AI is in theory possible to make, but really really hard. Perhaps try to halt AI progress until we have figured out how to do it safely? Take things slow. Organize one institution that will go as slowly and carefully as possible, while banning anyone faster and more reckless.

I think this is the world we are in. Safe AI isn't impossible, but it is really hard.

Solutionism isn't opposed to optimism and pessimism. It's a separate axis.

An accurate model of the future should include solutions we haven't invented yet, and also problems we haven't discovered yet. Both of these can be predictable, or can be hard to predict.

The correct large scale societal reaction is to try to solve the problem. But sometimes you personally have nowhere near the skills/resources/comparative advantage in solving it, so you leave it to someone else.

In the example given of solutionism, the difficulty of fixing atmospheric nitrogen depended on the details of chemistry. If it had been easier, then it would have been a case of "we have basically almost got this working, keep a few people finishing off the details and it will be fine."

If nitrogen fixation had turned out to be harder, then a period of tightening belts and sharply rationing nitrogen, as well as a massive well funded program to find that solution ASAP would be required.

Reality is not required to send you problems within your capability to solve. Reality also sends you problems that are possible to solve, but not without a lot of difficulties and tradeoffs.