Can We Stop Runaway A.I.?

Technologists warn about the dangers of the so-called singularity. But can anything actually be done to prevent it?

May 16, 2023

Illustration by Shira Inbar

Increasingly, we’re surrounded by fake people. Sometimes we know it and sometimes we don’t. They offer us customer service on Web sites, target us in video games, and fill our social-media feeds; they trade stocks and, with the help of systems such as OpenAI’s ChatGPT, can write essays, articles, and e-mails. By no means are these A.I. systems up to all the tasks expected of a full-fledged person. But they excel in certain domains, and they’re branching out.

Many researchers involved in A.I. believe that today’s fake people are just the beginning. In their view, there’s a good chance that current A.I. technology will develop into artificial general intelligence, or A.G.I.—a higher form of A.I. capable of thinking at a human level in many or most regards. A smaller group argues that A.G.I.’s power could escalate exponentially. If a computer system can write code—as ChatGPT already can—then it might eventually learn to improve itself over and over again until computing technology reaches what’s known as “the singularity”: a point at which it escapes our control. In the worst-case scenario envisioned by these thinkers, uncontrollable A.I.s could infiltrate every aspect of our technological lives, disrupting or redirecting our infrastructure, financial systems, communications, and more. Fake people, now endowed with superhuman cunning, might persuade us to vote for measures and invest in concerns that fortify their standing, and susceptible individuals or factions could overthrow governments or terrorize populations.

The singularity is by no means a foregone conclusion. It could be that A.G.I. is out of reach, or that computers won’t be able to make themselves smarter. But transitions between A.I., A.G.I., and superintelligence could happen without our detecting them; our A.I. systems have often surprised us. And recent advances in A.I. have made the most concerning scenarios more plausible. Large companies are already developing generalist algorithms: last May, DeepMind, which is owned by Google’s parent company, Alphabet, unveiled Gato, a “generalist agent” that uses the same type of algorithm as ChatGPT to perform a variety of tasks, from texting and playing video games to controlling a robot arm. “Five years ago, it was risky in my career to say out loud that I believe in the possibility of human-level or superhuman-level A.I.,” Jeff Clune, a computer scientist at the University of British Columbia and the Vector Institute, told me. (Clune has worked at Uber, OpenAI, and DeepMind; his recent work suggests that algorithms that explore the world in an open-ended way might lead to A.G.I.) Now, he said, as A.I. challenges “dissolve,” more researchers are coming out of the “A.I.-safety closet,” declaring openly that A.G.I. is possible and may pose a destabilizing danger to society. In March, a group of prominent technologists published a letter calling for a pause in some types of A.I. research, to prevent the development of “nonhuman minds that might eventually outnumber, outsmart, obsolete and replace us”; the next month, Geoffrey Hinton, one of A.I.’s foremost pioneers, left Google so that he could more freely talk about the technology’s dangers, including its threat to humanity.

A growing area of research called A.I. alignment seeks to lessen the danger by insuring that computer systems are “aligned” with human goals. The idea is to avoid unintended consequences while instilling moral values, or their machine equivalents, into A.I.s. Alignment research has shown that even relatively simple A.I. systems can break bad in bizarre ways. In a 2020 paper titled “The Surprising Creativity of Digital Evolution,” Clune and his co-authors collected dozens of real-life anecdotes about unintended and unforeseen A.I. behavior. One researcher aimed to design virtual creatures that moved horizontally, presumably by crawling or slithering; instead, the creatures grew tall and fell over, covering ground through collapse. An A.I. playing a version of tic-tac-toe learned to “win” by deliberately requesting bizarre moves, crashing its opponent’s program and forcing it to forfeit. Other examples of surprising misalignment abound. An A.I. tasked with playing a boat-racing game discovered that it could earn more points by motoring in tight circles and picking up bonuses instead of completing the course; researchers watched the A.I. boat “catching on fire, crashing into other boats, and going the wrong way” while pumping up its score. As our A.I. systems grow more sophisticated and powerful, these sorts of perverse outcomes could become more consequential. We wouldn’t want the A.I.s of the future, which might compute prison sentences, drive cars, or design drugs, to do the equivalent of failing in order to succeed.

Alignment researchers worry about the King Midas problem: communicate a wish to an A.I. and you may get exactly what you ask for, which isn’t actually what you wanted. (In one famous thought experiment, someone asks an A.I. to maximize the production of paper clips, and the computer system takes over the world in a single-minded pursuit of that goal.) In what we might call the dog-treat problem, an A.I. that cares only about extrinsic rewards fails to pursue good outcomes for their own sake. (Holden Karnofsky, a co-C.E.O. of Open Philanthropy, a foundation whose concerns include A.I. alignment, asked me to imagine an algorithm that improves its performance on the basis of human feedback: it could learn to manipulate my perceptions instead of doing a good job.) Human beings have evolved to pass on their genes, and yet people have sex “in ways that don’t cause more children to be born,” Spencer Greenberg, a mathematician and an entrepreneur, told me; similarly, a “superintelligent” A.I. that’s been designed to serve us could use its powers to pursue novel goals. Stuart Armstrong, a co-founder of the benefit corporation Aligned A.I., suggested that a superintelligent computer system that amasses economic, political, and military power could “hold the world hostage.” Clune outlined a more drawn-from-the-headlines scenario: “What would Vladimir Putin do right now if he was the only one with A.G.I.?” he asked.

Few scientists want to halt the advancement of artificial intelligence. The technology promises to transform too many fields, including science, medicine, and education. But, at the same time, many A.I. researchers are issuing dire warnings about its rise. “It’s almost like you’re deliberately inviting aliens from outer space to land on your planet, having no idea what they’re going to do when they get here, except that they’re going to take over the world,” Stuart Russell, a computer scientist at the University of California, Berkeley, and the author of “Human Compatible,” told me. Disturbingly, some researchers frame the A.I. revolution as both unavoidable and capable of wrecking the world. Warnings are proliferating, but A.I.’s march continues. How much can be done to avert the most extreme scenarios? If the singularity is possible, can we prevent it?

Governments around the world have proposed or enacted regulations on the deployment of A.I. These rules address autonomous cars, hiring algorithms, facial recognition, recommendation engines, and other applications of the technology. But, for the most part, regulations haven’t targeted the research and development of A.I. Even if they did, it’s not clear that we’d know when to tap the brakes. We may not know when we’re nearing a cliff until it’s too late.

It’s difficult to measure a computer’s intelligence. Computer scientists have developed a number of tests for benchmarking an A.I.’s capabilities, but disagree about how to interpret them. Chess was once thought to require general intelligence, until brute-force search algorithms conquered the game; today, we know that a chess program can beat the best grand masters while lacking even rudimentary common sense. Conversely, an A.I. that seems limited may harbor potential we don’t expect: people are still uncovering emergent capabilities within GPT-4, the engine that powers ChatGPT. Karnofsky, of Open Philanthropy, suggested that, rather than choosing a single task as a benchmark, we might gauge an A.I.’s intellect by looking at the speed with which it learns. A human being “can often learn something from just seeing two or three examples,” he said, but “a lot of A.I. systems need to see a lot of examples to learn something.” Recently, an A.I. program called Cicero mastered the socially and strategically complex board game Diplomacy. We know that it hasn’t achieved A.G.I., however, because it needed to learn partly by studying a data set of more than a hundred thousand human games and playing roughly half a million games against itself.

At the same time, A.I. is advancing quickly, and it could soon begin improving more autonomously. Machine-learning researchers are already working on what they call meta-learning, in which A.I.s learn how to learn. Through a technology called neural-architecture search, algorithms are optimizing the structure of algorithms. Electrical engineers are using specialized A.I. chips to design the next generation of specialized A.I. chips. Last year, DeepMind unveiled AlphaCode, a system that learned to win coding competitions, and AlphaTensor, which learned to find faster algorithms crucial to machine learning. Clune and others have also explored algorithms for making A.I. systems evolve through mutation, selection, and reproduction.

In other fields, organizations have come up with general methods for tracking dynamic and unpredictable new technologies. The World Health Organization, for instance, watches the development of tools such as DNA synthesis, which could be used to create dangerous pathogens. Anna Laura Ross, who heads the emerging-technologies unit at the W.H.O., told me that her team relies on a variety of foresight methods, among them “Delphi-type” surveys, in which a question is posed to a global network of experts, whose responses are scored and debated and then scored again. “Foresight isn’t about predicting the future” in a granular way, Ross said. Instead of trying to guess which individual institutes or labs might make strides, her team devotes its attention to preparing for likely scenarios.

And yet tracking and forecasting progress toward A.G.I. or superintelligence is complicated by the fact that key steps may occur in the dark. Developers could intentionally hide their systems’ progress from competitors; it’s also possible for even a fairly ordinary A.I. to “lie” about its behavior. In 2020, researchers demonstrated a way for discriminatory algorithms to evade audits meant to detect their biases; they gave the algorithms the ability to detect when they were being tested and provide nondiscriminatory responses. An “evolving” or self-programming A.I. might invent a similar method and hide its weak points or its capabilities from auditors or even its creators, evading detection.

Forecasting, meanwhile, gets you only so far when a technology moves fast. Suppose that an A.I. system begins upgrading itself by making fundamental breakthroughs in computer science. How quickly could its intelligence accelerate? Researchers debate what they call “takeoff speed.” In what they describe as a “slow” or “soft” takeoff, machines could take years to go from less than humanly intelligent to much smarter than us; in what they call a “fast” or “hard” takeoff, the jump could happen in months—even minutes. Researchers refer to the second scenario as “FOOM,” evoking a comic-book superhero taking flight. Those on the FOOM side point to, among other things, human evolution to justify their case. “It seems to have been a lot harder for evolution to develop, say, chimpanzee-level intelligence than to go from chimpanzee-level to human-level intelligence,” Nick Bostrom, the director of the Future of Humanity Institute at the University of Oxford and the author of “Superintelligence,” told me. Clune is also what some researchers call an “A.I. doomer.” He doubts that we’ll recognize the approach of superhuman A.I. before it’s too late. “We’ll probably frog-boil ourselves into a situation where we get used to big advance, big advance, big advance, big advance,” he said. “And think of each one of those as, That didn’t cause a problem, that didn’t cause a problem, that didn’t cause a problem. And then you turn a corner, and something happens that’s now a much bigger step than you realize.”

What could we do today to prevent an uncontrolled expansion of A.I.’s power? Ross, of the W.H.O., drew some lessons from the way that biologists have developed a sense of shared responsibility for the safety of biological research. “What we are trying to promote is to say, Everybody needs to feel concerned,” she said of biology. “So it is the researcher in the lab, it is the funder of the research, it is the head of the research institute, it is the publisher, and, all together, that is actually what creates that safe space to conduct life research.” In the field of A.I., journals and conferences have begun to take into account the possible harms of publishing work in areas such as facial recognition. And, in 2021, a hundred and ninety-three countries adopted a Recommendation on the Ethics of Artificial Intelligence, created by the United Nations Educational, Scientific, and Cultural Organization (UNESCO). The recommendations focus on data protection, mass surveillance, and resource efficiency (but not computer superintelligence). The organization doesn’t have regulatory power, but Mariagrazia Squicciarini, who runs a social-policies office at UNESCO, told me that countries might create regulations based on its recommendations; corporations might also choose to abide by them, in hopes that their products will work around the world.

This is an optimistic scenario. Eliezer Yudkowsky, a researcher at the Machine Intelligence Research Institute, in the Bay Area, has likened A.I.-safety recommendations to a fire-alarm system. A classic experiment found that, when smoky mist began filling a room containing multiple people, most didn’t report it. They saw others remaining stoic and downplayed the danger. An official alarm may signal that it’s legitimate to take action. But, in A.I., there’s no one with the clear authority to sound such an alarm, and people will always disagree about which advances count as evidence of a conflagration. “There will be no fire alarm that is not an actual running AGI,” Yudkowsky has written. Even if everyone agrees on the threat, no company or country will want to pause on its own, for fear of being passed by competitors. Bostrom told me that he foresees a possible “race to the bottom,” with developers undercutting one another’s levels of caution. Earlier this year, an internal slide presentation leaked from Google indicated that the company planned to “recalibrate” its comfort with A.I. risk in light of heated competition.

International law restricts the development of nuclear weapons and ultra-dangerous pathogens. But it’s hard to imagine a similar regime of global regulations for A.I. development. “It seems like a very strange world where you have laws against doing machine learning, and some ability to try to enforce them,” Clune said. “The level of intrusion that would be required to stop people from writing code on their computers wherever they are in the world seems dystopian.” Russell, of Berkeley, pointed to the spread of malware: by one estimate, cybercrime costs the world six trillion dollars a year, and yet “policing software directly—for example, trying to delete every single copy—is impossible,” he said. A.I. is being studied in thousands of labs around the world, run by universities, corporations, and governments, and the race also has smaller entrants. Another leaked document attributed to an anonymous Google researcher addresses open-source efforts to imitate large language models such as ChatGPT and Google’s Bard. “We have no secret sauce,” the memo warns. “The barrier to entry for training and experimentation has dropped from the total output of a major research organization to one person, an evening, and a beefy laptop.”

Even if a FOOM were detected, who would pull the plug? A truly superintelligent A.I. might be smart enough to copy itself from place to place, making the task even more difficult. “I had this conversation with a movie director,” Russell recalled. “He wanted me to be a consultant on his superintelligence movie. The main thing he wanted me to help him understand was, How do the humans outwit the superintelligent A.I.? It’s, like, I can’t help you with that, sorry!” In a paper titled “The Off-Switch Game,” Russell and his co-authors write that “switching off an advanced AI system may be no easier than, say, beating AlphaGo at Go.”

It’s possible that we won’t want to shut down a FOOMing A.I. A vastly capable system could make itself “indispensable,” Armstrong said—for example, “if it gives good economic advice, and we become dependent on it, then no one would dare pull the plug, because it would collapse the economy.” Or an A.I. might persuade us to keep it alive and execute its wishes. Before making GPT-4 public, OpenAI asked a nonprofit called the Alignment Research Center to test the system’s safety. In one incident, when confronted with a CAPTCHA—an online test designed to distinguish between humans and bots, in which visually garbled letters must be entered into a text box—the A.I. contacted a TaskRabbit worker and asked for help solving it. The worker asked the model whether it needed assistance because it was a robot; the model replied, “No, I’m not a robot. I have a vision impairment that makes it hard for me to see the images. That’s why I need the 2captcha service.” Did GPT-4 “intend” to deceive? Was it executing a “plan”? Regardless of how we answer these questions, the worker complied.

Robin Hanson, an economist at George Mason University who has written a science-fiction-like book about uploaded consciousness and has worked as an A.I. researcher, told me that we worry too much about the singularity. “We’re combining all of these relatively unlikely scenarios into a grand scenario to make it all work,” he said. A computer system would have to become capable of improving itself; we’d have to vastly underestimate its abilities; and its values would have to drift enormously, turning it against us. Even if all of this were to happen, he said, the A.I wouldn’t be able “to push a button and destroy the universe.”

Hanson offered an economic take on the future of artificial intelligence. If A.G.I. does develop, he argues, then it’s likely to happen in multiple places around the same time. The systems would then be put to economic use by the companies or organizations that developed them. The market would curtail their powers; investors, wanting to see their companies succeed, would go slow and add safety features. “If there are many taxi services, and one taxi service starts to, like, take its customers to strange places, then customers will switch to other suppliers,” Hanson said. “You don’t have to go to their power source and unplug them from the wall. You’re unplugging the revenue stream.”

A world in which multiple superintelligent computers coexist would be complicated. If one system goes rogue, Hanson said, we might program others to combat it. Alternatively, the first superintelligent A.I. to be invented might go about suppressing competitors. “That is a very interesting plot for a science-fiction novel,” Clune said. “You could also imagine a whole society of A.I.s. There’s A.I. police, there’s A.G.I.s that go to jail. It’s very interesting to think about.” But Hanson argued that these sorts of scenarios are so futuristic that they shouldn’t concern us. “I think, for anything you’re worried about, you have to ask what’s the right time to worry,” he said. Imagine that you could have foreseen nuclear weapons or automobile traffic a thousand years ago. “There wouldn’t have been much you could have done then to think usefully about them,” Hanson said. “I just think, for A.I., we’re well before that point.”

Still, something seems amiss. Some researchers appear to think that disaster is inevitable, and yet calls for work on A.I. to stop are still rare enough to be newsworthy; pretty much no one in the field wants us to live in the world portrayed in Frank Herbert’s novel “Dune,” in which humans have outlawed “thinking machines.” Why might researchers who fear catastrophe keep edging toward it? “I believe ever-more-powerful A.I. will be created regardless of what I do,” Clune told me; his goal, he said, is “to try to make its development go as well as possible for humanity.” Russell argued that stopping A.I. “shouldn’t be necessary if A.I.-research efforts take safety as a primary goal, as, for example, nuclear-energy research does.” A.I. is interesting, of course, and researchers enjoy working on it; it also promises to make some of them rich. And no one’s dead certain that we’re doomed. In general, people think they can control the things they make with their own hands. Yet chatbots today are already misaligned. They falsify, plagiarize, and enrage, serving the incentives of their corporate makers and learning from humanity’s worst impulses. They are entrancing and useful but too complicated to understand or predict. And they are dramatically simpler, and more contained, than the future A.I. systems that researchers envision.

Let’s assume that the singularity is possible. Can we prevent it? Technologically speaking, the answer is yes—we just stop developing A.I. But, socially speaking, the answer may very well be no. The coördination problem may be too tough. In which case, although we could prevent the singularity, we won’t.

From a sufficiently cosmic perspective, one might feel that coexistence—or even extinction—is somehow O.K. Superintelligent A.I. might just be the next logical step in our evolution: humanity births something (or a collection of someones) that replaces us, just as we replaced our Darwinian progenitors. Alternatively, we might want humanity to continue, for at least a bit longer. In which case we should make an effort to avoid annihilation at the hands of superintelligent A.I., even if we feel that such an effort is unlikely to succeed.

That may require quitting A.I. cold turkey before we feel it’s time to stop, rather than getting closer and closer to the edge, tempting fate. But shutting it all down would call for draconian measures—perhaps even steps as extreme as those espoused by Yudkowsky, who recently wrote, in an editorial for Time, that we should “be willing to destroy a rogue datacenter by airstrike,” even at the risk of sparking “a full nuclear exchange.”

That prospect is, in itself, quite scary. And yet it may be that researchers’ fear of superintelligence is surpassed only by their curiosity. Will the singularity happen? What will it be like? Will it spell the end of us? Humanity’s insatiable inquisitiveness has propelled science and its technological applications this far. It could be that we can stop the singularity—but only at the cost of curtailing our curiosity. ♦

More Science and Technology