How can we avoid an uncontrollable AI if regulation to prevent it is not feasible?

Alignment of a superintelligent AI with human values is very difficult, if possible at all. Given the current speed in AI development, it seems unlikely that we will have a solution for the alignment problem before we can build an uncontrollable AI. A misaligned uncontrollable AI, however, will very likely destroy our future. If these assumptions are true, the only option we have is to not build one, at least until we can solve alignment.

A common objection to this is: “But that’s impossible, given the unilateralist’s curse. You can’t get the level of global coordination necessary to regulate AI so that nobody will develop an AGI. Even if you could, it’s impossible to enforce that regulation globally. Therefore, AGI is inevitable.”

AI governance is indeed very difficult. But if we can’t align an uncontrollable AI and regulation to prevent it isn’t feasible, then “dying with dignity” seems to be the only option

However, there may be another alternative. Humans do not only coordinate through rules and regulations. Sometimes, a sufficient level of common knowledge is enough.

Common knowledge as a tool for coordination

De Freitas et al. have shown that common knowledge is an important factor in getting people to cooperate and coordinate. It works in two ways: knowing what the right thing to do is individually and knowing that others know the same and will act accordingly. The latter makes it much easier to do the right thing in most cases.

There are two reasons, for instance, to stop at a red traffic light. On one hand, you know individually that you shouldn’t cross it and that if you do it anyway and get caught, you’ll get fined. But more importantly, you know that others will generally follow the traffic rules and expect you to do the same. If it’s green, you can trust that other drivers at a crossroads will stop at their red lights and not crash into you. Everyone follows these rules mainly because we trust in others to know and obey them. Of course, there are exceptions – people do ignore red lights sometimes – but they are relatively rare.

People coordinate by common knowledge all the time. We agree on common languages, legal rules, and standards of politeness. People show up at the same time at birthday parties, concerts, and soccer matches because everyone knows where the event will take place and when it starts. Money is probably the most powerful example of coordination by common knowledge: A hundred-dollar note is only worth anything if everyone believes in its worth. If people lose that faith, the value of a currency goes down and inflation goes up. The same is true for many things we regard as valuable, like an NFT or an original painting by Picasso.

We also agree on things we shouldn’t do. We don’t let our children play on the highway. We don’t eat the first unknown mushroom we find in the woods. We don’t climb into a cage in the zoo to pet the tigers. It’s common knowledge that these things are dangerous and no one in their right mind would do them. There’s even a satirical award for people who are stupid enough to do obviously dangerous things anyway, precisely because people rarely do them.

The big advantage of coordination by common knowledge is that you don’t necessarily have to enforce regulation to prevent bad things from happening. Even if there were no fines, most people would still stop at red traffic lights. However, it’s difficult to prevent 100% of bad things this way, so coordination by common knowledge is usually combined with regulation to reduce the chance that people who are ignorant of the common knowledge or choose to ignore it can do bad things. For example, to drive a car you have to be old enough, need a driver’s license, and must not be under the influence of alcohol or drugs.

Coordination by common knowledge also has its downsides. It is sometimes difficult to create the necessary level of common knowledge, especially if that knowledge is controversial. False common knowledge can be used to deceive and mislead people, for example manipulating their political opinions with fake news. People can get caught in a bubble, accepting false common knowledge in their social group, like covid deniers or flat earthers. But this only illustrates how powerful a coordination tool common knowledge is.

Using common knowledge to help avoiding uncontrollable AI

To avoid uncontrollable AI in this way, we would need a high degree of common knowledge about how an AI could become uncontrollable, why it is difficult to align it to our values, and why a misaligned uncontrollable AI would very likely destroy our future. This is certainly difficult to achieve, but maybe not impossible.

The first thing that we would need to establish is the common knowledge that uncontrollable AI can actually be prevented. If everyone believes this is impossible and AGI - which would ultimately be uncontrollable - is inevitable, no one will be motivated to end the race for AGI. People will fall for the illusion that it is better if they develop an AGI before someone else does. This is a fallacy: after all, it doesn’t really matter who develops the uncontrollable AI that destroys the world. The belief that AGI is inevitable may turn out to be a deadly self-fulfilling prophecy.

AGI is not truly inevitable. We haven’t built it yet. We can still commonly decide to not build it, at least until we have solved the alignment problem. We don’t even need it for an amazing future.

On first glance, history seems to show that humans have always done every stupid thing they could. Whenever something was technically possible, people have built it. Nuclear bombs seem to be an obvious example: we have created enough of them to destroy humanity many times over. During the Cold War, many people thought that a nuclear war was inevitable. Yet, even though there were some close calls, we have managed to avoid it so far. The main reason for this is the common knowledge of mutually assured destruction: if one side attacks the other with nuclear weapons, there will be retaliation and both sides will lose more than they could ever gain.

This kind of equilibrium is of course fragile. Through nuclear proliferation, more and more nations get the ability to start a nuclear war. This increases the probability that it will happen someday, either by accident or initiated by some mad dictator who thinks he has nothing to lose. So in the long run, a nuclear war, at least a local one, seems inevitable. However, each year without a nuclear war is a good year. It buys us time to find ways for better mitigating the risks, e.g. by governing nuclear weapons with international treaties. Maybe we’ll even manage to get rid of them completely one day, for example if we can create a stable global world order. This is difficult, but maybe not impossible. And if we survive long enough, we may spread across the galaxy and even a global nuclear war would not destroy human civilization anymore.

There are other examples of things we’re not doing although we could. We have banned biological weapons (although there still seem to be some unhealthy experiments in secret labs). We largely refrain from doing genetic experiments on human embryos. We have stopped sacrificing humans to appease the gods. In most parts of the world, slavery is illegal. Regulation plays a role in these examples, but what came first was a common understanding that these things were bad and shouldn’t be done.

Of course, there are also counterexamples. We have not managed to create a high enough level of common knowledge about the covid pandemic to get everyone vaccinated and wearing masks. Although there is a lot of common knowledge about climate change, many people, corporations, and governments largely ignore it and act as if it didn’t happen. 

To avoid the creation of an uncontrollable AI, the overlap of people who want to create one with people who are able to do so must be precisely zero. The size of the first group can be reduced by creating as much common knowledge about the dangers and the difficulties of alignment as possible. However, the second group must also be kept as small as possible. For that, regulation is needed, for example by tracking GPUs and TPUs and possibly restricting access to computing power. This is a complex task and not the topic of this post.

Given all the difficulties, why should we expect that it is possible to coordinate by common knowledge enough to prevent an uncontrollable AI?

There are some reasons to be hopeful:

  1. Almost no one wants to destroy the world. Even terrorists usually want to destroy only parts of it.
  2. There is no rational economic argument for creating a misaligned uncontrollable AI. The net present value of destroying the world is infinitely negative.
  3. The dangers of uncontrollable AI are relatively easy to explain and understand.
  4. Currently, only a limited number of people have access to the technology necessary to create an uncontrollable AI. These people are usually pretty smart and most of them seem to be at least partly aware of the risks.
  5. We haven’t built it yet, so there is nothing of value we already have achieved that we would need to give up in order to prevent uncontrollable AI. Also, not developing one will not incur any actual costs. This is different from most other coordination problems, e.g. climate change.
  6. The general public is already quite concerned about the fast progress in AI.

But there are also a number of barriers to creating the necessary level of common knowledge:

  1. It is not well understood what exactly makes an AI uncontrollable and there seems to be little research currently being done to better understand it.
  2. There are huge potential economic benefits from developing a powerful AI that is almost, but not quite uncontrollable, so some people may be willing to take huge risks.
  3. As we don’t understand the inner workings of large neural networks, some developers may be overconfident about their abilities to control the AI they develop, or they may be unaware that they are about to release an uncontrollable AI.
  4. Many people don’t believe yet that the problem is real in the near-term because of the availability heuristic (they haven’t seen an AI getting out of control yet, so they intuitively think it is unlikely), but also because they underestimate the capabilities of current AIs (e.g. calling them “stochastic parrots”) and overestimate the difficulty of surpassing human intelligence.
  5. With the rapid advances of technology, the number of people with access to the necessary technology to create an uncontrollable AI could be growing fast.
  6. Many people believe that AGI is inevitable and coordination to prevent uncontrollable AI is impossible, therefore they don’t even try it (see above).

How to create the necessary common knowledge

To overcome these barriers, we probably need more research to better understand what exactly makes an AI uncontrollable, so we can draw “red lines” that mustn’t be crossed. For these red lines to be commonly accepted, the underlying research must be common knowledge among all who might cross them. We also need a common understanding of the dangers of uncontrollable AI so people know why it would be stupid to create one.

The latter is mainly a communication problem. The arguments why it’s not a good idea to create an AI that is smarter than a human and not aligned with our values are already on the table. The one example we know of a superior intelligence taking over the world – homo sapiens killing off all other hominid species, destroying most natural habitats, changing earth’s climate, and causing a mass extinction – speaks for itself. Many laypeople intuitively understand this. But there are a lot of AI risk deniers who for various reasons disregard these arguments, mostly without even engaging with them.

It takes time, patience, and a lot of effort to convince these people. However, the Overton window seems to be shifting right now. The amazing capabilities of ChatGPT and GPT-4 have made the claims that LLMs are just “stochastic parrots” increasingly unconvincing. The open letter by the Future of Life Institute has drawn a lot of media attention. Geoffrey Hinton’s departure from Google to warn about the risks of advanced AI has given the field of AI safety additional credibility. The leaders of DeepmindGoogleMicrosoft, and OpenAI have even publicly stated that they are at least partly aware of AI risks.

Of course, this doesn’t mean that everything is fine. Currently, the race for AGI is still fully underway. We need a major coordination effort to stop it. We need the leaders of the top AI labs to come together and agree that they’ll abandon this race and not risk our future by blindly pushing ahead. 

The necessary prerequisite for this to happen is that those leaders all share the same common knowledge that if they continue the race, someone will likely create a misaligned uncontrollable AI which would destroy our future. And they need to know that the others understand this as well.

Things that could be helpful to achieve a common understanding of the dangers of uncontrollable AI (incomplete list):

  • As much research as possible about how AI could become uncontrollable and where to draw red lines
  • Specific examples and explanations of how things can go wrong (e.g. specific failure stories, but also experiments and possibly real-world accidents as “warning shots”)
  • Public declarations by renowned leaders and experts about the dangers of uncontrollable AI
  • Public outcries and protests by concerned laypeople, as long as they are legal, nonviolent, and based on valid, fact-based arguments
  • Balanced, well informed news articles and opinion pieces
  • Well-made information resources (e.g. Rob Miles’ videosaisafety.info, or this talk)
  • Podcasts, TV interviews etc. with AI safety experts
  • Well-made documentaries about the dangers of uncontrollable AI
  • Well-written, realistic fictional movies and books (the movie “The Day After” supposedly made Ronald Reagan take the risks of a nuclear war more seriously, “War Games” achieved the same for cybersecurity)

Things that may not be helpful (incomplete list):

  • Claims that AGI is “inevitable” and coordination to prevent uncontrollable AI is impossible
  • Unjustified optimism about AI alignment (e.g. “RLHF will be sufficient”, “We will solve it in time even if we don’t know yet how”, “We will design them to be submissive to us”
  • Downplaying the risks, ridiculing those who are concerned, or calling them “scaremongers” or “luddites”
  • Insincere statements about AI safety concerns (e.g. saying “we are aware of the risks and know we have to be cautious” while recklessly pushing capabilities development to gain competitive advantage, calling for a pause in the development of LLMs and at the same time founding a start-up to advance capabilities, etc.)
  • Using heuristics instead of arguments (e.g. “people have always been afraid of technology, therefore concerns about AI safety are unjustified”)
  • Anthropomorphizing (e.g. “AI will never be conscious”, “the human brain is much more complex than people think, therefore AGI is much harder than they expect”, etc.)
  • Violent or illegal protests
  • Divisiveness (e.g. demeaning or insulting people with opposing views, including the leaders of the big AI labs) and anything that inhibits a productive discussion
  • Politicization of AI safety
  • Not speaking out publicly about the dangers of uncontrollable AI out of fear for one’s personal reputation or job prospects

Conclusion

The possibility of coordination by common knowledge seems to be largely neglected in the current discussion about AI safety. There are many difficulties and possible objections, but declaring coordination to be impossible before even trying is giving up one of the few options left to us to avert an existential catastrophe from uncontrollable AI.

New Comment
2 comments, sorted by Click to highlight new comments since:

A hundred-dollar note is only worth anything if everyone believes in its worth. If people lose that faith, the value of a currency goes down and inflation goes up.

Ah, the condition for the reality of money is much weaker though - you only have to believe that you will be able to find "someone" who believes they can find someone for whom money will be worth something, no need to involve "everyone" in one's reasoning.

Inflation is much more complicated of course, but in essence, you only have to believe that other people believe that money is losing value and will buy the same thing for higher price from you to be incentivized to increase prices, you don't have to believe that you yourself will be able to buy less from your suppliers, increasing the price for higher profits is a totally valid reason for doing so.

This is also a kind of "coordination by common knowledge", but the parties involved don't have to share the same "knowledge" per se - consumers might believe "prices are higher because of inflation" while retailers might belive "we can make prices higher because people believe in inflation"...

Not sure myself whether search for coordination by common knowledge incentivizes deceptive alignment "by default" (having an exponentially larger basin) or if some reachable policy can incentivize true aligmnent 🤷

Yes, thanks for the clarification! I was indeed oversimplifying a bit.