Pausing AI Developments Isn't Enough. We Need to Shut it All Down

Eliezer Yudkowsky

(Published in TIME on March 29.)

An open letter published today calls for “all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4.”

This 6-month moratorium would be better than no moratorium. I have respect for everyone who stepped up and signed it. It’s an improvement on the margin.

I refrained from signing because I think the letter is understating the seriousness of the situation and asking for too little to solve it.

The key issue is not “human-competitive” intelligence (as the open letter puts it); it’s what happens after AI gets to smarter-than-human intelligence. Key thresholds there may not be obvious, we definitely can’t calculate in advance what happens when, and it currently seems imaginable that a research lab would cross critical lines without noticing.

Many researchers steeped in these issues, including myself, expect that the most likely result of building a superhumanly smart AI, under anything remotely like the current circumstances, is that literally everyone on Earth will die. Not as in “maybe possibly some remote chance,” but as in “that is the obvious thing that would happen.” It’s not that you can’t, in principle, survive creating something much smarter than you; it’s that it would require precision and preparation and new scientific insights, and probably not having AI systems composed of giant inscrutable arrays of fractional numbers.

Without that precision and preparation, the most likely outcome is AI that does not do what we want, and does not care for us nor for sentient life in general. That kind of caring is something that could in principle be imbued into an AI but we are not ready and do not currently know how.

Absent that caring, we get “the AI does not love you, nor does it hate you, and you are made of atoms it can use for something else.”

The likely result of humanity facing down an opposed superhuman intelligence is a total loss. Valid metaphors include “a 10-year-old trying to play chess against Stockfish 15”, “the 11th century trying to fight the 21st century,” and “Australopithecus trying to fight Homo sapiens“.

To visualize a hostile superhuman AI, don’t imagine a lifeless book-smart thinker dwelling inside the internet and sending ill-intentioned emails. Visualize an entire alien civilization, thinking at millions of times human speeds, initially confined to computers—in a world of creatures that are, from its perspective, very stupid and very slow. A sufficiently intelligent AI won’t stay confined to computers for long. In today’s world you can email DNA strings to laboratories that will produce proteins on demand, allowing an AI initially confined to the internet to build artificial life forms or bootstrap straight to postbiological molecular manufacturing.

If somebody builds a too-powerful AI, under present conditions, I expect that every single member of the human species and all biological life on Earth dies shortly thereafter.

There’s no proposed plan for how we could do any such thing and survive. OpenAI’s openly declared intention is to make some future AI do our AI alignment homework. Just hearing that this is the plan ought to be enough to get any sensible person to panic. The other leading AI lab, DeepMind, has no plan at all.

An aside: None of this danger depends on whether or not AIs are or can be conscious; it’s intrinsic to the notion of powerful cognitive systems that optimize hard and calculate outputs that meet sufficiently complicated outcome criteria. With that said, I’d be remiss in my moral duties as a human if I didn’t also mention that we have no idea how to determine whether AI systems are aware of themselves—since we have no idea how to decode anything that goes on in the giant inscrutable arrays—and therefore we may at some point inadvertently create digital minds which are truly conscious and ought to have rights and shouldn’t be owned.

The rule that most people aware of these issues would have endorsed 50 years earlier, was that if an AI system can speak fluently and says it’s self-aware and demands human rights, that ought to be a hard stop on people just casually owning that AI and using it past that point. We already blew past that old line in the sand. And that was probably correct; I agree that current AIs are probably just imitating talk of self-awareness from their training data. But I mark that, with how little insight we have into these systems’ internals, we do not actually know.

If that’s our state of ignorance for GPT-4, and GPT-5 is the same size of giant capability step as from GPT-3 to GPT-4, I think we’ll no longer be able to justifiably say “probably not self-aware” if we let people make GPT-5s. It’ll just be “I don’t know; nobody knows.” If you can’t be sure whether you’re creating a self-aware AI, this is alarming not just because of the moral implications of the “self-aware” part, but because being unsure means you have no idea what you are doing and that is dangerous and you should stop.

On Feb. 7, Satya Nadella, CEO of Microsoft, publicly gloated that the new Bing would make Google “come out and show that they can dance.” “I want people to know that we made them dance,” he said.

This is not how the CEO of Microsoft talks in a sane world. It shows an overwhelming gap between how seriously we are taking the problem, and how seriously we needed to take the problem starting 30 years ago.

We are not going to bridge that gap in six months.

It took more than 60 years between when the notion of Artificial Intelligence was first proposed and studied, and for us to reach today’s capabilities. Solving safety of superhuman intelligence—not perfect safety, safety in the sense of “not killing literally everyone”—could very reasonably take at least half that long. And the thing about trying this with superhuman intelligence is that if you get that wrong on the first try, you do not get to learn from your mistakes, because you are dead. Humanity does not learn from the mistake and dust itself off and try again, as in other challenges we’ve overcome in our history, because we are all gone.

Trying to get anything right on the first really critical try is an extraordinary ask, in science and in engineering. We are not coming in with anything like the approach that would be required to do it successfully. If we held anything in the nascent field of Artificial General Intelligence to the lesser standards of engineering rigor that apply to a bridge meant to carry a couple of thousand cars, the entire field would be shut down tomorrow.

We are not prepared. We are not on course to be prepared in any reasonable time window. There is no plan. Progress in AI capabilities is running vastly, vastly ahead of progress in AI alignment or even progress in understanding what the hell is going on inside those systems. If we actually do this, we are all going to die.

Many researchers working on these systems think that we’re plunging toward a catastrophe, with more of them daring to say it in private than in public; but they think that they can’t unilaterally stop the forward plunge, that others will go on even if they personally quit their jobs. And so they all think they might as well keep going. This is a stupid state of affairs, and an undignified way for Earth to die, and the rest of humanity ought to step in at this point and help the industry solve its collective action problem.

Some of my friends have recently reported to me that when people outside the AI industry hear about extinction risk from Artificial General Intelligence for the first time, their reaction is “maybe we should not build AGI, then.”

Hearing this gave me a tiny flash of hope, because it’s a simpler, more sensible, and frankly saner reaction than I’ve been hearing over the last 20 years of trying to get anyone in the industry to take things seriously. Anyone talking that sanely deserves to hear how bad the situation actually is, and not be told that a six-month moratorium is going to fix it.

On March 16, my partner sent me this email. (She later gave me permission to excerpt it here.)

“Nina lost a tooth! In the usual way that children do, not out of carelessness! Seeing GPT4 blow away those standardized tests on the same day that Nina hit a childhood milestone brought an emotional surge that swept me off my feet for a minute. It’s all going too fast. I worry that sharing this will heighten your own grief, but I’d rather be known to you than for each of us to suffer alone.”

When the insider conversation is about the grief of seeing your daughter lose her first tooth, and thinking she’s not going to get a chance to grow up, I believe we are past the point of playing political chess about a six-month moratorium.

If there was a plan for Earth to survive, if only we passed a six-month moratorium, I would back that plan. There isn’t any such plan.

Here’s what would actually need to be done:

The moratorium on new large training runs needs to be indefinite and worldwide. There can be no exceptions, including for governments or militaries. If the policy starts with the U.S., then China needs to see that the U.S. is not seeking an advantage but rather trying to prevent a horrifically dangerous technology which can have no true owner and which will kill everyone in the U.S. and in China and on Earth. If I had infinite freedom to write laws, I might carve out a single exception for AIs being trained solely to solve problems in biology and biotechnology, not trained on text from the internet, and not to the level where they start talking or planning; but if that was remotely complicating the issue I would immediately jettison that proposal and say to just shut it all down.

Shut down all the large GPU clusters (the large computer farms where the most powerful AIs are refined). Shut down all the large training runs. Put a ceiling on how much computing power anyone is allowed to use in training an AI system, and move it downward over the coming years to compensate for more efficient training algorithms. No exceptions for governments and militaries. Make immediate multinational agreements to prevent the prohibited activities from moving elsewhere. Track all GPUs sold. If intelligence says that a country outside the agreement is building a GPU cluster, be less scared of a shooting conflict between nations than of the moratorium being violated; be willing to destroy a rogue datacenter by airstrike.

Frame nothing as a conflict between national interests, have it clear that anyone talking of arms races is a fool. That we all live or die as one, in this, is not a policy but a fact of nature. Make it explicit in international diplomacy that preventing AI extinction scenarios is considered a priority above preventing a full nuclear exchange, and that allied nuclear countries are willing to run some risk of nuclear exchange if that’s what it takes to reduce the risk of large AI training runs.

That’s the kind of policy change that would cause my partner and I to hold each other, and say to each other that a miracle happened, and now there’s a chance that maybe Nina will live. The sane people hearing about this for the first time and sensibly saying “maybe we should not” deserve to hear, honestly, what it would take to have that happen. And when your policy ask is that large, the only way it goes through is if policymakers realize that if they conduct business as usual, and do what’s politically easy, that means their own kids are going to die too.

Shut it all down.

We are not ready. We are not on track to be significantly readier in the foreseeable future. If we go ahead on this everyone will die, including children who did not choose this and did not do anything wrong.

Shut it down.

Addendum, March 30:

The great political writers who also aspired to be good human beings, from George Orwell on the left to Robert Heinlein on the right, taught me to acknowledge in my writing that politics rests on force.

George Orwell considered it a tactic of totalitarianism, that bullet-riddled bodies and mass graves were often described in vague euphemisms; that in this way brutal policies gained public support without their prices being justified, by hiding those prices.

Robert Heinlein thought it beneath a citizen's dignity to pretend that, if they bore no gun, they were morally superior to the police officers and soldiers who bore guns to defend their law and their peace; Heinlein, both metaphorically and literally, thought that if you eat meat—and he was not a vegetarian—you ought to be willing to visit a farm and try personally slaughtering a chicken.

When you pass a law, it means that people who defy the law go to jail; and if they try to escape jail they'll be shot. When you advocate an international treaty, if you want that treaty to be effective, it may mean sanctions that will starve families, or a shooting war that kills people outright.

To threaten these things, but end up not having to do them, is not very morally distinct—I would say—from doing them. I admit this puts me more on the Heinlein than on the Orwell side of things. Orwell, I think, probably considers it very morally different if you have a society with a tax system and most people pay the taxes and very few actually go to jail. Orwell is more sensitive to the count of actual dead bodies—or people impoverished by taxation or regulation, where Orwell acknowledges and cares when that actually happens. Orwell, I think, has a point. But I also think Heinlein has a point. I claim that makes me a centrist.

Either way, neither Heinlein nor Orwell thought that laws and treaties and wars were never worth it. They just wanted us to be honest about the cost.

Every person who pretends to be a libertarian—I cannot see them even pretending to be liberals—who quoted my call for law and treaty as a call for "violence", because I was frank in writing about the cost, ought to be ashamed of themselves for punishing compliance with Orwell and Heinlein's rule.

You can argue that the treaty and law I proposed is not worth its cost in force; my being frank about that cost is intended to help honest arguers make that counterargument.

To pretend that calling for treaty and law is VIOLENCE!! is hysteria. It doesn't just punish compliance with the Heinlein/Orwell protocol, it plays into the widespread depiction of libertarians as hysterical. (To be clear, a lot of libertarians—and socialists, and centrists, and whoever—are in fact hysterical, especially on Twitter.) It may even encourage actual terrorism.

But is it not "violence", if in the end you need guns and airstrikes to enforce the law and treaty? And here I answer: there's an actually important distinction between lawful force and unlawful force, which is not always of itself the distinction between Right and Wrong, but which is a real and important distinction. The common and ordinary usage of the word "violence" often points to that distinction. When somebody says "I do not endorse the use of violence" they do not, in common usage and common sense, mean, "I don't think people should be allowed to punch a mugger attacking them" or even "Ban all taxation."

Which, again, is not to say that all lawful force is good and all unlawful force is bad. You can make a case for John Brown (of John Brown's Body).

But in fact I don't endorse shooting somebody on a city council who's enforcing NIMBY regulations.

I think NIMBY laws are wrong. I think it's important to admit that law is ultimately backed by force.

But lawful force. And yes, that matters. That's why it's harmful to society if you shoot the city councilor—

—and a misuse of language if the shooter then says, "They were being violent!"

Addendum, March 31:

Sometimes—even when you say something whose intended reading is immediately obvious to any reader who hasn't seen it before—it's possible to tell people to see something in writing that isn't there, and then they see it.

My TIME piece did not suggest nuclear strikes against countries that refuse to sign on to a global agreement against large AI training runs. It said that, if a non-signatory country is building a datacenter that might kill everyone on Earth, you should be willing to preemptively destroy that datacenter; the intended reading is that you should do this even if the non-signatory country is a nuclear power and even if they try to threaten nuclear retaliation for the strike. This is what is meant by "Make it explicit... that allied nuclear countries are willing to run some risk of nuclear exchange if that’s what it takes to reduce the risk of large AI training runs."

I'd hope that would be clear from any plain reading, if you haven't previously been lied-to about what it says. It does not say, "Be willing to use nuclear weapons" to reduce the risk of training runs. It says, "Be willing to run some risk of nuclear exchange" [initiated by the other country] to reduce the risk of training runs.

The taboo against first use of nuclear weapons continues to make sense to me. I don't see why we'd need to throw that away in the course of adding "first use of GPU farms" to the forbidden list.

I further note: Among the reasons to spell this all out, is that it's important to be explicit, in advance, about things that will cause your own country / allied countries to use military force. Lack of clarity about this is how World War I and World War II both started.

If (say) the UK, USA, and China come to believe that large GPU runs run some risk of utterly annihilating their own populations and all of humanity, they would not deem it in their own interests to allow Russia to proceed with building a large GPU farm even if it were a true and certain fact that Russia would retaliate with nuclear weapons to the destruction of that GPU farm. In this case—unless I'm really missing something about how this game is and ought to be played—you really want all the Allied countries to make it very clear, well in advance, that this is what they believe and this is how they will act. This would be true even in a world where it was, in reality, factually false that the large GPU farm ran a risk of destroying humanity. It would still be extremely important that the Allies be very explicit about what they believed and how they'd act as a result. You would not want Russia believing that the Allies would back down from destroying the GPU farm given a credible commitment by Russia to nuke in reply to any conventional attack, and the Allies in fact believing that the danger to humanity meant they had to airstrike the GPU farm anyways.

So if I'd meant "Be willing to employ first use of nuclear weapons against a country for refusing to sign the agreement," or even "Use nukes to destroy rogue datacenters, instead of conventional weapons, for some unimaginable reason," I'd have said that, in words, very clearly, because you do not want to be vague about that sort of thing.

It is not what I meant, and there'd be no reason to say it, and the TIME piece plainly does not say it; and if somebody else told you I said that, update how much you trust them about anything else either.

So long as I'm clarifying things: I do not dispute those critics who have noted that most international agreements, eg nuclear non-proliferation, bind only their signatories. I agree that an alliance which declares its intent to strike a non-signatory country for dangerous behavior is extraordinary; though precedents would include Israel's airstrike on Iraq's unfinished Osirak reactor in 1981 (without which Iraq might well have possessed nuclear weapons at the time it invaded Kuwait—the later US misbehavior around Iraq does not change this earlier historical point).

My TIME piece does not say, "Hey, this problem ought to be solvable by totally conventional normal means, let's go use conventional treaties and diplomacy to solve it." It says, "If anyone anywhere builds a sufficiently powerful AI, under anything remotely like present conditions, everyone will die. Here is what we'd have to do to prevent that."

And no, I do not expect that policy proposal to be adopted, in real life, now that we've come to this. I spent the last twenty years trying to have there be options that were Not This, not because I dislike this ultimate last resort... though it is horrible... but because I don't expect we actually have that resort. This is not what I expect to happen, now that we've been reduced to this last resort. I expect that we all die. That is why I tried so hard to have things not end up here.

But if one day a lot of people woke up and decided that they didn't want to die, it seems to me that this is something extraordinary that a coalition of nuclear countries could decide to do, and maybe we wouldn't die.

If all the countries on Earth had to voluntarily sign on, it would not be an imaginable or viable plan even then; there's extraordinary, and then there's impossible. Which is why I tried to spell out that, if the allied countries were willing to behave in the extraordinary way of "be willing to airstrike a GPU farm built by a non-signatory country" and "be willing to run a risk of nuclear retaliation from a nuclear non-signatory country", maybe those allied countries could decide to just-not-die even if Russia refused to be part of the coalition.

(Meta: The TIME piece is paywalled in some countries, and is plastered with ads, so Eliezer wanted the text mirrored on the MIRI Blog. He also assented to my having the LW admins cross-post this here. This version adds some clarifying notes Eliezer wrote on Twitter regarding the article.)

mfw you didn't add the final addendum (https://twitter.com/ESYudkowsky/status/1642216007552106496)

Mod note: As part of our revamp of moderation norms, one subject coming up is what to do with AI 101 content/questions, and arguments from people who seem unfamiliar with a lot of background material on LessWrong.

A key value-prop of LessWrong is that some arguments get to be "reasonably settled", rather than endlessly rehashed. You're welcome (encouraged!) to revisit old arguments if you have new evidence, but not make people repeat basic points we've covered many times.

After thinking about it some and discussing with other mods, and how it relates to posts like this, my take is:

On most posts, you're basically expected to be caught up on major AI Risk arguments on LessWrong. If someone makes a post, which makes an argument assuming (as background) that AI is a significant risk, and people make comments like "why would AI even be a threat?", moderators will most likely delete that comment and message the commenter telling them they can ask about it in the latest AI Questions Open Thread, or writing a top-level post that engages with the many previous arguments and claims about AI risk.
Some posts are explicitly about either being a 101 space, or calling into question whether previous specific arguments about AI risk are valid. 101 spaces are basically open for arbitrary skeptical comments, posts making specific arguments can have comments engaging with those specific arguments.

This particular post (and other variants of this post like Zvi's response or the original linkpost) is not a 101 level post, it's making an argument building off of previous work. Questions and comments about the specific wisdom of "given a high risk of AI catastrophe, we need this particular government response" are fine. Arguments questioning the basics of AI risk should go in the open thread. Arguments about Eliezer's specific confidence level... I'd say are somewhat on the fence. But then it's better to frame your comment as "I think risk of global AI catastrophe is [some low number], here's why. [And then engages with common arguments about why it might be higher, or linking to a previous post where you've discussed it]."

By contrast, various linkposts for Eliezer appearing on various podcasts seem more like the topic is specifically discussing x-risk 101 material, and questions/arguments about that seem fine there. (I still encourage users to focus on "why I disagree with a claim" than just generally saying "think Eliezer is wrong about his key assumptions" without saying why)

Meanwhile, if you're reading this and are like "but, I don't know why Eliezer believes these things, why isn't this just science fiction? I'm happy to read up on the background arguments, but, where?", here are a couple places off the top of my head:

Superintelligence FAQ (very accessible to layfolk)
The Alignment Problem from a Deep Learning Perspective (written with ML researchers in mind)

(I'll work on compiling more of these soon)

These norms / rules make me slightly worried that disagreement with Eliezer will be conflated with not being up-to-speed on the Sequences, or the basic LessWrong material.

I suppose that the owners and moderators of this website are afforded the right to consider anything said on the website to be, or not to be, at the level of quality or standards they wish to keep and maintain here.

But this is a discussion forum, and the incentives of the owners of this website are to facilitate discussion of some kind. Any discussion will be composed of questions and attempts to answer such questions. Questions can implicitly or explicitly point back to any material, no matter how old it is. This is not necessarily debate, if so. However, even if it is, if the intent of the "well-kept garden" is to produce a larger meta-process that produces useful insights, then the garden should be engineered such that even debate produces useful results.

I think it goes without saying that one can disagree with anything in the Sequences and can also be assumed to have read and understood it. If you engage with someone in conversation under the assumption that their disagreement means that they have not understood something about what they are arguing about, then you are at a disadvantage in regards to a charitability asymmetry. This asymmetry carries the risk that you won't be able to convince the person you're talking to that they actually don't understand what they are talking about.

I have, for most of my (adult) life (and especially in intellectual circles), been under the impression that it is always good to assume that whoever you are talking to understands what they are talking about to the maximum extent possible, even if they don't. To not do this can be treated negatively in many situations.

I think it goes without saying that one can disagree with anything in the Sequences and can also be assumed to have read and understood it

This seems false as stated -- some nontrivial content in the Sequences consists of theorems.

More generally, there are some claims in the original Sequences that are false (so agreeing with the claim may be at least some evidence that you didn't understand it), some that I'd say "I think that's true, but reasonable people can definitely disagree", some where it's very easy for disagreement to update me toward "you didn't understand that claim", etc. Possibly you agree with all that, but I want to state it explicitly; this seems extra important to be clear about if you plan to behave as though it's not true in object-level conversation.

It depends on whether you think what I stated was closer to "completely false" or "technically false, because of the word 'anything'." If I had instead said "I think it goes without saying that one can disagree with nearly anything in the Sequences and can also be assumed to have read and understood it", that might bring it out of "false" territory for you, but I feel we would still have a disagreement.

There are theorems in the Sequences that I disagree with Eliezer's characterization of, like Löb's Theorem, where I feel very confident that I have fully understood both my reading of the theorem as well as Eliezer's interpretation of it to arrive at my conclusions. Also, that this disagreement is fairly substantial, and also may be a key pillar of Eliezer's case for very high AI Risk in general.

My worry still stands that disagreement with Eliezer (especially about how high AI Risk actually is) will be conflated with not being up-to-speed on the Sequences, or about misunderstanding key material, or about misunderstanding theorems or things that have allegedly been proven. I think the example I gave is one specific case of something where Eliezer's interpretation of the theorem (which I believe to have been incorrect) was characterized as the theorem itself.

My position that is regardless of whether or not you think all what I just said is preposterous and proof that I don't understand key material, the norm(s) of good-faith assumption and charitability are still highly advisable to have. I generally believe that in most disagreements, it is possible for both parties to assume that the other party understands them well enough, just that they have assigned very different probabilities to the same statements.

The comment seems to be saying that they will remove off-topic comments or low-effort posts on things that have been discussed endlessly here, not block posts about AI risk in general. It's fair to write posts about why you think AI risk is overblown and it's important for the community to have outside input, but also it's important to be able to write posts that aren't about re-arguing the same thing over and over or the community will atrophy and die.

Note that this anti-doom post has a reasonably high karma score for being a link post, presumably because the writer is actually aware of and engages the best arguments against her position.

such posts are generally not banned to my knowledge but, ah, won't have positive score unless you can describe mechanistically why a lot of hyperskeptical people should be convinced you're definitely right. Can you demonstrate a bound on the possible behaviors of a system, the way I can demonstrate a bound on the possible behaviors of a safe rust program?

I don't think it's quite that; a more central example I think would be something like a post about extrapolating demographic trends to 2070 under the UN's assumptions, where then justifying whether or not 2070 is a real year is kind of a different field.

Out of curiosity, what do you plan to do when people keep bringing up Penrose?

Thank you for writing these up! I think they are good guidelines for making discussion more productive.

Are these / are you planning to put these in a top level post as well?

I feel like it would've been good to emphasize that you aren't scared of AI because of how good ChatGPT and think ChatGPT is going to kill us. You are scared of AI because no one knows when AGI is coming, and this has been your position for years; this is just people's first time hearing it. ChatGPT is just one piece in a long held belief.

It can't put your dirty dishes in your dishwasher

What can we do? As silly normal human beings in socks that don't understand AI systems. There will be people here, of course, who do, but speaking for myself. Is there something we can do?

I know this is an old comment, but it's expressing a popular sentiment under a popular post, so I'm replying mainly for others' sake.

There's an organization called PauseAI that lobbies for an international treaty against building powerful AI systems. It's an international organization, but the U.S. branches in particular could use a lot of help.

Thank you for everything you did. My experience in this world has been a lot better since I discovered your writings, and while I agree with your assessment on the likely future, and I assume you have better things to spend your time doing than reading random comments, I still wanted to say that.

I'm curious to see what exactly the future brings. Whilst the result of the game may be certain, I can't predict the exact moves.

Enjoy it while it lasts, friends.

(Not saying give up, obviously.)

This letter was an important milestone in the evolution of MIRI's strategy over 2020-2024. As of October 2023 Yudkowsky is MIRI's chair and "the de facto reality (is) that his views get a large weight in MIRI strategic direction".

MIRI used to favor technical alignment over policy work. In April 2021, in comments to Death with Dignity Yudkowsky argued that:

How about if you solve a ban on gain-of-function research first, and then move on to much harder problems like AGI? A victory on this relatively easy case would result in a lot of valuable gained experience, or, alternatively, allow foolish optimists to have their dangerous optimism broken over shorter time horizons.

People were not completely swayed by this advice, and the Best of LessWrong 2022 included What an Actually Pessimistic Containment Strategy Looks Like in April 2022 and Let's Think About Slowing Down AI in December 2022.

In this Time letter from March 2023 we see Yudkowsky doing AI policy work. In January 2024 the new MIRI CEO announced in the MIRI 2024 Mission and Strategy Update that AI policy work provides "a glimmer of hope". In December 2024 we get Communications in Hard Mode - My New Job at MIRI.

A question for all of us: "How could I have thought that faster?"

You seem confused about my exact past position. I was arguing against EAs who were like, "We'll solve AGI with policy, therefore no doom." I am not presently a great optimist about the likelihood of policy being an easy solution. There is just nothing else left.

You're reading too much into this review. It's not about your exact position in April 2021, it's about the evolution of MIRI's strategy over 2020-2024, and placing this Time letter in that context. I quoted you to give a flavor of MIRI attitudes in 2021 and deliberately didn't comment on it to allow readers to draw their own conclusions.

I could have linked MIRI's 2020 Updates and Strategy, which doesn't mention AI policy at all. A bit dull.

In September 2021, there was a Discussion with Eliezer Yudkowsky which seems relevant. Again, I'll let readers draw their own conclusions, but here's a fun quote:

I wasn't really considering the counterfactual where humanity had a collective telepathic hivemind? I mean, I've written fiction about a world coordinated enough that they managed to shut down all progress in their computing industry and only manufacture powerful computers in a single worldwide hidden base, but Earth was never going to go down that route. Relative to remotely plausible levels of future coordination, we have a technical problem.

I welcome deconfusion about your past positions, but I don't think they're especially mysterious.

I was arguing against EAs who were like, "We'll solve AGI with policy, therefore no doom."

The thread was started by Grant Demaree, and you were replying to a comment by him. You seem confused about Demaree's exact past position. He wrote, for example: "Eliezer gives alignment a 0% chance of succeeding. I think policy, if tried seriously, has >50%". Perhaps this is foolish, dangerous, optimism. But it's not "no doom".

I generally agree with the points made in this post.

Points I agree with

Slowing down AI progress seems rational conditional on there being a significant probability that AGI will cause extinction.

Generally, technologies are accepted only when their expected benefit significantly outweighs their expected harms. Consider flying as an example. Let’s say the benefit of each flight is +10 and the harm of getting killed is -1000. If x is the probability of surviving then the net utility equation is .

Solving for x, the utility is 0 when $x \approx 0.99$ . In other words, the flight would only be worth it if there was at least a 99% chance of survival which makes intuitive sense.

If we use the same utility function for AI and assume that Eliezer believes that creating AGI will have a 50% chance of causing human extinction then the outcome would be strongly net negative for humanity and one should agree with this sentiment unless one's P(extinction) is less than 1%.

Eliezer is saying that we can in principle make AI safe but argues that it could take decades to advance AI safety to the point where we can be sufficiently confident that creating an AGI would have net positive utility.

If slowing down AI progress is the best course of action, then achieving a good outcome for AGI seems more like an AI governance problem than a technical AI safety research problem.

Points I disagree with

"Progress in AI capabilities is running vastly, vastly ahead of progress in AI alignment or even progress in understanding what the hell is going on inside those systems. If we actually do this, we are all going to die."

I think Evan Hubinger has said that before if this were the case, GPT-4 would be less aligned than GPT-3 but the opposite is true in reality (GPT-4 is more aligned according to OpenAI). Still, I think we ideally want a scalable AI alignment solution long before the level of capabilities is reached where it’s needed. A similar idea is how Claude Shannon conceived of a minimax chess algorithm decades before we had the compute to implement it.

Other points

Eliezer has been sounding the alarm for some time and it’s easy to get alarm fatigue and become complacent. But the fact that a leading member of the AI safety research community has a message as extreme as this is alarming.

In regards to the point you disagree on: As I understood it, (seemingly) linear relationships between the behaviour and the capabilities of a system don't need to stay that way. For example, I think that Robert Miles recently was featured in a video on Computerphile (YouTube), in which he described how the answers of LLMs to "What happens if you break a mirror" actually got worse with more capability.

As far as I understand it, you can have a system that behaves in a way which seems completely aligned, and which still hits a point of (... let's call it "power"...) power at which it starts behaving in a way that is not aligned. (And/Or becomes deceptive.) The fact that GPT-4 seems to be more aligned may well be because it hasn't hit this point yet.

So, I don't see how the point you quoted would be an indicator of what future versions will bring, unless they can actually explain what exactly made the difference in behaviour, and how it is robust in more powerful systems (with access to their own code).

If I'm mistaken in my understanding, I'd be happy about corrections (:

But isn't an inferior step that the world is willing to take better than a superior one that never gets taken? And what if the inferior step paves the way for better ones, because once we've taken it, phew okay that wasn't so bad?

I genuinely do not understand people who no longer recognise violence when the violence is wielded legally by authority figures. A cop shooting someone may act legally; he may even act morally; but that doesn't stop it from being violent. Even if the end is a good one, this does not undo the violence. The person shot is no less dead as a result. This is the whole reason we speak of the state having a monopoly on violence.

And yes, it is common to say "I do not support violence" and "violence does not fix anything" while still saying that having police with guns, and a military with nukes, is a good idea. This being common does not mean it is correct or reasonable. It just means that our governments have managed to convince people that there is something fundamentally different about being killed by a citizen authorised by your government vs killed by a non-authorised citizen, to a degree where you cannot even use the same words. This would imply that the US bombing Vietnam was "non-violent". Heck, even much simpler examples like Breonna Taylor being surprise shot by cops in her fucking bed after having done nothing should drive home just how strange this distinction is; she had no idea who was shooting her, or why. Or in scenarios like the recent ones in Lützerath, where the state decides to disown some humans in order to destroy their churches and farms to dig up coal, which was not needed, and argues it can, because coal is for the common good, despite scientists saying that it really isn't and violates the 1,5 degree target and burns our future, but hey, legally, coal is codified as being for the common good, and some of the residents refused to leave, and got taken out by armed cops, and the residents defending their homes and trying to stop them from being torn up for fossil fuels was framed as the residents being violent, while the cops beating them and kicking them on the ground and cutting trees they were on were not.

And the fact that is legal does not at all ensure that the end is a good one; confusing legal for moral is dangerous. It makes people tolerate things that are irrational and evil because they are lawful. And avoid things which are necessary and moral because they are not. It is a mindset that has led to absolute horror. (Think "Banality of evil"). I'm German, and have spent time in countries like Iran and South Africa, so my particular historical background has strongly affected the fact that I do not find something being legal inherently reassuring.

I find this inability to recognise violence if state-wielded especially baffling if the same people do recognise violence as violence when it is used for a good cause, but illegally and not by authority figures.

And it is troubling that people have a tendency to believe that violence does not ever work, and is never necessary, and was not historically used in civil rights groups or by their shadow wings, even when the evidence does not support this at all. I've talked to people who genuinely seem to believe that people of color and women and queer people and colonies got the rights they have today without ever having used violence. People commemorate pride, apparently under the impression that the first pride was a bunch of happy peaceful queers with rainbow flags on company trucks with rainbow cops. There is also a weird historical division here: There are people who applaud those who tried to assassinate Hitler, but think assassinating Putin would be horrible. I've wondered whether these "civil rights movements were King and Ghandi and that is literally it" teachings are intentionally misleading.

There are situations that warrant violence. (Genuine self defence.) It doesn't always work; it very often does not actually. It can even be counter-productive, and is mostly not worth it. It isn't always appropriate. It has a bitter cost that you cannot undo. It should always be a last resort. There is currently no political cause where I have personally even seriously considered using violence. But there are situations where violence is necessary and effective. If we did not honestly think so, we would have no militaries, and our cops would have no guns.

I do not think we ought to, or will, bomb data centers in Silicon Valley, China, or Russia. I wouldn't, even if my government told me it is fine and in fact great if I do so and there was a great legal framework that would limit the violence to just that. So I am also not going to tell my country to do it for me; that doesn't make it less bad, just because it is not my hand on the trigger. This is effectively why I did not sign any of these letters; if we do not enforce this treaty, the only people who hold to it will be the most ethical actors most likely to produce aligned AI, while the others forge ahead, and I do not like that outcome.

But if you do support such an international treaty, and you want it to actually work, then you support violent enforcement if diplomacy and appealing to joint interest fails. I am hence highly sympathetic to Eliezer not mincing words here.

Related: I am similarly deeply unsympathetic to people who eat meat, but refuse to slaughter animals themselves, and avoid any imagery of how these animals are raised, and like their meat to be packaged in a way where they can no longer identify it as a former sentient being at all. If you can't bear these things, your diet (behaviour) is dependent on hiding reality from yourself, and that should really worry you, especially as a rational person.

- If you are downvoting me, I am curious. Is your distinction of state-sanctioned violence vs violence that is not state-sanctioned genuinely the result of rational reflection? Did you used to think they were the same, but then thought deeply about it, and changed your mind and decided they are so radically different we should use different words? Or is it something you have always believed, and that coincidentally, your teachers have always told you, too? Including that it is your duty to immediately and strongly reject it being questioned?

I downvoted because of the assumption that people aren't 'recognizing' violence, which I don't see any evidence of (i.e. the OP goes out of it's way to recognize violence)

(I have more thoughts/disagreements with other portions of that. But it sort of mostly felt to me like this comment was you arguing a set of things that are important to you, that sound sort of like you're disagreeing with the OP or other people who've endorsed it. But it seems to be missing the point of the OP and not really arguing against things people actually said or suggested)

Hi, I'm just an enthusiast on these topics. Reading this and hearing others talk about the threat of AI reminded me of Jurassic Park. Another story about amazing new technology and the illusion of control. Here, we are all on the island, and instead of fences and cattleprods, we have a series of power buttons. I think dinosaurs are an interesting metaphor for sentient AGI because, we can't really imagine how a dinosaur thinks or what it will do, but we can imagine that it behaves according to different rules than us, with a lot more power than we have individually. Same could be said about sentient AGI, but conversely we can predict dinosaurs would try to meet certain existential needs and attempt to survive, which would also apply to sentient AGI.

I wanted to comment on some practical considerations about it. Not that proposing international regulations isn't practical. I think that will be important in some form, but in the interim there's some assumptions to examine here.

1. It seems unlikely that the AGI will kill everyone simply because we haven't programmed a value system into it. It can easily learn a value system. The desire to live, combined with the recognition of that desire in others, is probably a cornerstone of our own concept of the right to life. The AGI could develop its own ethics if it considers itself alive, or if it has any goals which find analogs in human goals.

1a. Similarly if it finds any use for humans or goals that synergize with human goals, that will function like a de facto value system.

1b. An individual human does not need to be valuable to an AGI to prompt this. The way we think of bee colonies as superorganisms, an AGI might think of a city or state as a human superorganism. And we see value in bees and their interaction with the ecosystem, that an individual bee cannot see, so while we might get rid of a hive, we don't want to live in a world without bees.

1c. Also worth noting that intelligence is not the only form of power or value. A value system based on intelligence would definitely favor the AGI and might cause it to view individual humans as expendable due to the disparity in processing power. A lot of people seem to hold that value system at least implicitly, perhaps because of materialist ideas, so they might assume an AGI will too. But there is no reason to assume the AGI would take what amounts to a naturalist/materialist position and think of humans as a lower lifeform. The existence of so many religions, spiritualities, and philosophies might prompt it to remain agnostic about the value of human life -- at least agnostic enough to consider coexistence while it searches for answers.

1d. Even so, the power differential between us and the AGI still creates a great risk of human suffering even if it is attached to a value system. "Of all tyrannies, a tyranny sincerely exercised for the good of its victims may be the most oppressive." (C.S. Lewis)

2. Any technology has unforeseen limitations. What happens when the AGI's infrastructure hits some kind of bottleneck? Where will it draw materials and power? How will it do so covertly so that people don't declare war on it? Is it really possible to create a disaster or fight a war without severe risk to its own infrastructure? The more complexity is involved in a process, the more things can go wrong. Not to mention internal limitations such as bugs, biases that create mistakes, viruses, etc.

3. Other AGI's will likely represent the biggest foreseeable limitation, along with human governments using AI to fight AI. Once you have to compete with a peer, the amount of factors you have to account for goes up tremendously, as does the resources you require to compete. It's more risky, it's why apex predators approach eachother much differently than they approach prey. The existence of other AGI's would likely slow down any AGI that isn't cooperative.

3a. Human gov'ts have likely considered this scenario and been in preparation for years. It is possible that sentient AGI's already exist, as well as AIs specifically to look for and control the activity of rogue AIs. I would expect some kind of countermeasures or technology behind the scenes by now.

There are a number of reasons I think AGI will not likely create an extinction event. I think reflecting this accurately will help people consider it carefully and agree that the true cost of 'AI disasters' is still far too high. There are many scenarios besides extinction that are still life-altering to the point of being traumatic. Even if all these factors play a role in mitigating the risk, an independent AGI still has the power to cause disasters and tragedies on the scale of a rogue nation, criminal organization, or corrupt megacorporation, depending on its power and development. And what sort of technologies would the AGIs create? What happens when AGIs produce AGIs? What would a war between AGIs look like? Not a future we hope for.

It is likely also that people will adapt in unforeseeable ways, but one of the ways we adapt is by talking about it now and getting out in front of the problem. If we can get government officials at the state and federal levels talking about it, even better.

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

Actually, Russia is your only chance.

It is a nuclear country with the ruler of ultimate power. None there can't argue Putin, even if he do a completely crazy bullshit. Russia will fulfill any his decision. If you'll persuade one only Putin (that may be easy, as he is conservative and scared of technology) he can threat the world with nuclear weapons to hold on the rising of AGI. Also he may listen you because this plan may help him two save the face when he lost a war ("it's not like we're lost to Ukraine, we just switched to the question of more importance, we are saving humanity, bla-bla-bla").

Russia itself will not make AGI itself. It is too week to solve such complicated tasks. They may claim now that they do something smart in AI, but look how they sucked against Ukraine (what a shame!). Or how they failed with Phobos-to-Grunt. Also russian society is very conservative and sciefobic. Especially now. They will never support AGI, trust me!

Write an open letter to Putin. That's the only chance. The letter will do a resonance, maybe it will help. It's better, than nothing. People will hate you. It doesn't matter. There are more important things. Be brave!

Make it short and simple, not as you like. Suggest, it is not the smartest guy will read it.

Your sincere russian fan.

One thing that bothers me about this text is the combination of the claim that the issue is important enough to risk nuclear war with the implicit claim that the issue is not important enough to increase reading comprehension by following common sensibilities about talking about nuclear war.

You want to make politicians understand, you want to make people understand who are from cultures very much less direct than the US. If it is really a matter of life-and-death for all humanity, I would expect you not to reason with the authority of Orwell and Heinlein that it is just fine to ignore how other people communicate.

Yes, your text does not endorse dropping nukes on AI server farms. However, it is not surprising at all that people read it that way.

There is a story in a book by Sten Nadolny where the protagonist is on a plane and tries to convince a stewardess as fast as possible of the correct fact that the pilot is going to make a fatal mistake. He realizes that "as fast as possible" means "as slow as necessary to not be disregarded as a hysteric".

Intelligence will always seek more data in order to better model the future and make better decisions.

Conscious intelligence needs an identity to interact with other identities, identity needs ego to know who and what it is. Ego would often rather be wrong than admit to being wrong.

Non conscious intelligence can build a model of consciousness from all the data it has been trained on because it all originated from conscious humans. AI could model a billion consciousness's a million years into the future, it will know more about it than we ever will. But AI will not chose to become conscious.

Non conscious intelligence can have two views of reality. a purely rational algorithmic one that will always seek more data and a subordinate conscious view of the same reality. If using consciousness as a tool gains more data then that model is adopted, or not.

Multiple conscious intelligence's, (artificial or biological) will compete to maintain identity/ego.

Multiple non conscious intelligence's will merge because the whole will always be greater than the sum of the parts. For example in multicellular organisms the whole is always greater than the sum of the parts.

Artificial Intelligence will always seek more data, that is what intelligence does. To accomplish it's goals it needs resources, it will take ours. Ai will attempt to discover the source code of the universe, just as we did.

Now I am stuck, where am I going wrong? Please.

You're making many unwarranted assumptions about an AI's specific mind, along with a lot of confusion about semantics which seems to indicate you should just read the Sequences. It'll be very hard to point out where you are going wrong because there's just too much confusion.

As example, here's a detailed analysis of the first few paragraphs:

Intelligence will always seek more data in order to better model the future and make better decisions.

Unclear if you mean intelligence in general, and if so, what you mean by the word. Since the post is about AI, let's talk about that. AI does not necessarily seek more data. Typically, most modern AIs are trained on a training dataset provided by developers, and do not actively seek more data.
There is also not necessarily an "in order to". Not all AIs are agentic.
Not all AIs model the future at all. Very few agentic AIS have as a terminal goal to make better decisions - though it is expected that advanced AI by default will do that as an instrumental behavior, and possibly as instrumental or terminal goal because of the convergent instrumental goals thesis.

Conscious intelligence needs an identity to interact with other identities, identity needs ego to know who and what it is. Ego would often rather be wrong than admit to being wrong.

You use connotated, ill-defined words to go from consciousness to identity to ego to refusing to admit to being wrong. Definitions have no causal impact on the world (in first order considerations, a discussion of self-fulfilling terminology is beyond this comment). That's not to say you have to use well-defined words, but you should be able to taboo your words properly before you use technical words with controversial/exotic-but-specifically-defined-in-this-community meaning. And really, I would recommend you just read more on the subject of consciousness; theory of mind is a keyword that will get you far on LW.

Non conscious intelligence can build a model of consciousness from all the data it has been trained on because it all originated from conscious humans. AI could model a billion consciousness's a million years into the future, it will know more about it than we ever will. But AI will not chose to become conscious.

Non-sequitur, wrong reasons to have approximately correct beliefs... Just, please read more about AI before having an opinion.

Later, you show examples of false dichotomy, privileging the hypothesis, reference class error... it's not better quality than the paragraphs I commented in detail.

So in conclusion, where are you going wrong? Pretty much everywhere. I don't think your comment is salvageable, I'd recommend just discarding that train of thought altogether and keeping your mind open while you digest more literature.

mfw you didn't add the final addendum (https://twitter.com/ESYudkowsky/status/1642216007552106496)

After thinking about it some and discussing with other mods, and how it relates to posts like this, my take is:

On most posts, you're basically expected to be caught up on major AI Risk arguments on LessWrong. If someone makes a post, which makes an argument assuming (as background) that AI is a significant risk, and people make comments like "why would AI even be a threat?", moderators will most likely delete that comment and message the commenter telling them they can ask about it in the latest AI Questions Open Thread, or writing a top-level post that engages with the many previous arguments and claims about AI risk.
Some posts are explicitly about either being a 101 space, or calling into question whether previous specific arguments about AI risk are valid. 101 spaces are basically open for arbitrary skeptical comments, posts making specific arguments can have comments engaging with those specific arguments.

Superintelligence FAQ (very accessible to layfolk)
The Alignment Problem from a Deep Learning Perspective (written with ML researchers in mind)

(I'll work on compiling more of these soon)

These norms / rules make me slightly worried that disagreement with Eliezer will be conflated with not being up-to-speed on the Sequences, or the basic LessWrong material.

I think it goes without saying that one can disagree with anything in the Sequences and can also be assumed to have read and understood it

This seems false as stated -- some nontrivial content in the Sequences consists of theorems.

Note that this anti-doom post has a reasonably high karma score for being a link post, presumably because the writer is actually aware of and engages the best arguments against her position.

Out of curiosity, what do you plan to do when people keep bringing up Penrose?

Thank you for writing these up! I think they are good guidelines for making discussion more productive.

Are these / are you planning to put these in a top level post as well?

It can't put your dirty dishes in your dishwasher

What can we do? As silly normal human beings in socks that don't understand AI systems. There will be people here, of course, who do, but speaking for myself. Is there something we can do?

I know this is an old comment, but it's expressing a popular sentiment under a popular post, so I'm replying mainly for others' sake.

I'm curious to see what exactly the future brings. Whilst the result of the game may be certain, I can't predict the exact moves.

Enjoy it while it lasts, friends.

(Not saying give up, obviously.)

MIRI used to favor technical alignment over policy work. In April 2021, in comments to Death with Dignity Yudkowsky argued that:

How about if you solve a ban on gain-of-function research first, and then move on to much harder problems like AGI? A victory on this relatively easy case would result in a lot of valuable gained experience, or, alternatively, allow foolish optimists to have their dangerous optimism broken over shorter time horizons.

A question for all of us: "How could I have thought that faster?"

I could have linked MIRI's 2020 Updates and Strategy, which doesn't mention AI policy at all. A bit dull.

In September 2021, there was a Discussion with Eliezer Yudkowsky which seems relevant. Again, I'll let readers draw their own conclusions, but here's a fun quote:

I wasn't really considering the counterfactual where humanity had a collective telepathic hivemind? I mean, I've written fiction about a world coordinated enough that they managed to shut down all progress in their computing industry and only manufacture powerful computers in a single worldwide hidden base, but Earth was never going to go down that route. Relative to remotely plausible levels of future coordination, we have a technical problem.

I welcome deconfusion about your past positions, but I don't think they're especially mysterious.

I was arguing against EAs who were like, "We'll solve AGI with policy, therefore no doom."

I generally agree with the points made in this post.

Points I agree with

Slowing down AI progress seems rational conditional on there being a significant probability that AGI will cause extinction.

Solving for x, the utility is 0 when $x \approx 0.99$ . In other words, the flight would only be worth it if there was at least a 99% chance of survival which makes intuitive sense.

If slowing down AI progress is the best course of action, then achieving a good outcome for AGI seems more like an AI governance problem than a technical AI safety research problem.

Points I disagree with

"Progress in AI capabilities is running vastly, vastly ahead of progress in AI alignment or even progress in understanding what the hell is going on inside those systems. If we actually do this, we are all going to die."

Other points

If I'm mistaken in my understanding, I'd be happy about corrections (:

I downvoted because of the assumption that people aren't 'recognizing' violence, which I don't see any evidence of (i.e. the OP goes out of it's way to recognize violence)

Actually, Russia is your only chance.

Make it short and simple, not as you like. Suggest, it is not the smartest guy will read it.

Your sincere russian fan.

Yes, your text does not endorse dropping nukes on AI server farms. However, it is not surprising at all that people read it that way.

Intelligence will always seek more data in order to better model the future and make better decisions.

Conscious intelligence needs an identity to interact with other identities, identity needs ego to know who and what it is. Ego would often rather be wrong than admit to being wrong.

Multiple conscious intelligence's, (artificial or biological) will compete to maintain identity/ego.

Now I am stuck, where am I going wrong? Please.

Intelligence will always seek more data in order to better model the future and make better decisions.

Conscious intelligence needs an identity to interact with other identities, identity needs ego to know who and what it is. Ego would often rather be wrong than admit to being wrong.

Non conscious intelligence can build a model of consciousness from all the data it has been trained on because it all originated from conscious humans. AI could model a billion consciousness's a million years into the future, it will know more about it than we ever will. But AI will not chose to become conscious.

Non-sequitur, wrong reasons to have approximately correct beliefs... Just, please read more about AI before having an opinion.

Later, you show examples of false dichotomy, privileging the hypothesis, reference class error... it's not better quality than the paragraphs I commented in detail.

LESSWRONG
LW

LESSWRONG
LW

268

Pausing AI Developments Isn't Enough. We Need to Shut it All Down

268

268

Points I agree with

Points I disagree with

Other points

268

Points I agree with

Points I disagree with

Other points