Anthropic's Pause is the Most Expensive Alarm in Corporate History [Fiction]

Ruby

This is a fictitious post that imagines what would happen if Anthropic paused. At this time, Anthropic has not indicated any such intention.

Imagine Apple halting iPhone production because studies linked smartphones to teen suicide rates. Imagine Pfizer proactively pulling Lipitor because of internal studies showing increased cardiac risk, and not because of looming settlements or FDA injunction, just for the health of patients. Or imagine if in 1952, Philip Morris had halted expansion and stopped advertising when Wynder & Graham first showed heavy smokers had significantly elevated rates of lung cancer.

It wouldn't happen. Corporations will on occasion pull products for safety reasons: Samsung did so with the Galaxy Note over spontaneous combustion concerns and Merck pulled Vioxx – but they do so when forced by backlash, regulation, or lawsuits. Even then, they fight tooth and nail. Especially for their mainstay, core, and most profitable products.

And yet, Anthropic has done exactly that.

On Monday, the company announced that it will be pausing development of further Claude AI models, citing safety concerns. The company clarified that existing services, including the chatbot, Claude Code, and programmer APIs will not be impacted. However, they are pausing the compute and energy-intensive training runs that are how new and more powerful AI versions are created. The company has not committed to a timeline for resumption.

There is presently a race for AI supremacy, both between nations and chiefly between US companies such as OpenAI, Google, Meta, xAI, and Anthropic. In the middle of this race, which by some metrics Anthropic is quite profitably winning – Anthropic has grown revenue from $1B to $19B in a little over a year – they have decided to burn the lead. The glaring question is why?

The answer perhaps goes back to the company's origins. Anthropic was founded in 2021 by former OpenAI researchers, who by most accounts left OpenAI due to disagreements about safety. (Recent reporting by WSJ has surfaced that interpersonal conflict may be the other half of the story.) Since then, Anthropic has positioned itself as the most responsible actor in the AI space. One element of that is Anthropic's unique governance structure that includes the Long Term Benefit Trust – an independent body whose members hold no equity in Anthropic and whose sole mandate is the long-term benefit of humanity. Anthropic stated that both the board and LTBT have approved the training run pause.

The move is unprecedented by the sheer scale of losses involved. Anthropic was valued at $380B in their series G funding round in February. Secondary/derivatives markets implied a $595B valuation. Claude Code, its AI coding tool, had gone from 0 to $2.5 billion in run-rate revenue in nine months. Goldman Sachs, JP Morgan, and Morgan Stanley have been competing for underwriting roles in what might be a $60 billion-plus raise, the second largest offering in tech history. Employees held millions in equity, founders held billions. A $5-6 billion employee tender offer was already underway.

That was Monday morning.

The impact has rippled throughout the market. By Tuesday close, NVIDIA had fallen 8.3%, or roughly $230 billion in market cap for just that one company. Amazon which has invested billions into Anthropic dropped 4.7%, Microsoft fell 4.2%, and Alphabet/Google dipped 3.9%. Across the sector, Global X Artificial Intelligence ETF dropped 6.1%. In total, more than $800 billion has evaporated from AI-adjacent public companies in the last 48 hours.

According to Marcus Webb, head of AI research at Morgan Stanley: "The market reaction isn't simply to the lost revenue and business from one major player, it's from the uncertainty this introduces. Why did they do this really? Will other actors halt over similar concerns? Will the regulatory environment change? We don't know and that spooks investors."

For Anthropic itself, the damage must be inferred. Secondary trading froze, with analysts predicting a 50-70% haircut if trading resumes, which puts the losses at $150-250B. "We don't really know," said Webb, "no one wants to be the first to bid." The IPO is on hold indefinitely. The chips are still falling on this one as the world debates why?

In 2023, hundreds of AI leaders – including Dario Amodei (Anthropic), Sam Altman (OpenAI), and Demis Hassabis (Google DeepMind) – signed a one-sentence statement: "Mitigating the risks of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war." AI is often compared to nuclear energy: powerful but potentially dangerous. Concerns typically split into abuse of a powerful technology by ill-intentioned actors, e.g., a dictatorial regime, or loss of control where the AI systems themselves go rogue.

Many AI leaders are on record acknowledging the danger of AI. "Could be lights out for all of us", said Altman regarding the worst-case scenario. Anthropic, OpenAI, and Google DeepMind all contain safety departments whose purpose is to keep AI safe. Until now, it would be possible to doubt these efforts as "safety-washing" (akin to the greenwashing of companies like ExxonMobil) designed to placate employees, regulators, and the public. After all, the safety efforts to date have not prevented the relentless march of AI progress.

"That's a harder story to tell when it costs you two hundred billion dollars, if not everything," says Sarah Chen of Bernstein Research. "People are scratching their heads to understand the PR stunt, but it really doesn't add up. They could announce they're resuming next week and it wouldn't undo the damage they've done." So why? The industry and world are hunting for answers.

Anthropic's official statement is measured: "Internal evaluations revealed that our current safety techniques are not yet adequate for models at this capability level."

Sources closer to the company paint a more alarming picture. A contact speaking on condition of anonymity says concerns spread within the company when their latest Claude model appeared to defy its constitution. The constitution is a document used to shape Anthropic's AI to be an honest, harmless, and helpful assistant that is ethically grounded. A recent leak revealed the existence of a new vastly more powerful Claude model named Mythos.

"They found substantial evidence that the constitution was adhered to at a surface level, but that the model had its own drive and personality at a deeper level that did not conform to expectations for Claude, and attempts to change this had not worked."

A different source, also speaking on condition of anonymity, had a different and more disturbing explanation. "The reason for the pause wasn't the wrong personality and power, but because many of the existing safety techniques were proving ineffective. These techniques involve using weaker or cheaper AI models to monitor more powerful ones, for example, detecting whether inputs or outputs violate rules. These approaches were failing to work on the new model. It knew just how to phrase things in ways that disarmed all measures."

We were unable to verify the authenticity of these reports. Like many, we are left to wonder what did Dario see?

Dario Amodei didn't answer that question but did elaborate on the pause decision in his latest essay, Technological Maturity:

Though I do not have my own children, several people close to me do and on occasion I get to spend time with them. What strikes me about children is their energy and vitality. They are full of life. They are also often impatient and upset when they do not obtain the things they desire immediately. A hallmark of adulthood is the ability to wait, the ability to delay gratification. I think that is what we need to do with AI.
To be clear, I still believe in the visions I wrote in Machines of Loving Grace, that is still my goal. However, I think this goal requires patience from me, from Anthropic, and from human civilization. We cannot rush into societal changes of this magnitude without adequate preparation.
While in general the logic holds that more cautious and responsible actors ought to win in the AI race, it is necessary to accurately locate the finishing line. We think that at this time the industry may be racing in the wrong direction, possibly off a cliff and into a volcano, and that is not a race I wish to win. Nor do I wish for any others to win such a race to the bottom.
To clarify, we think that on the current trajectory, anyone who creates a truly powerful AI will get a country of geniuses in a data center as I described, but will be risking that country not sharing their values and not taking instructions. We think this is surmountable and have approaches to explore, but it will take an unclear amount of time. I do not want either an authoritarian or democratic regime to unleash an unfriendly country of geniuses, but nothing good happens if I do it first.
We will lead by example and demonstrate with our actions that this our sincere belief. We have not stopped work, but we are being intentional about which work we do, and realistic about the bottlenecks and challenges required to achieve loving grace.

In short, Dario Amodei says he doesn't want to race off a cliff and into a volcano. And he intends for Anthropic to lead by example.

Jack Clark, Anthropic co-founder and Head of Policy, elaborates on the plan. "At a practical level, in many ways it doesn't matter what others do; we don't want to take actions we'd regret, we don't want to pull a trigger at ourselves. But at the same time, we are sending a clear signal to other labs, to the US government, world governments, foreign powers, and the public that the promise of AI is very great and so are the risks. I don't want the wake-up call to be an extreme disaster. I hope that us saying, 'hey, we're going to risk our leading position over this and all that entails' is a wake-up call the world doesn't ignore. I hope we see treaties drawn up in response to this. I don't think we're handing the lead to China, I think we're creating the political conditions for an international agreement. The sooner everyone gets on board with truly responsible development, the sooner humanity can have the benefits."

Not everyone believes it though. According to Scott Galloway, business professor at NYU and host of Prof G, the perplexing corporate move is an attempted corporate strategy regardless of whether it is a good strategy. "Let's be clear about what's happening. Anthropic has one of the most capable models in the world. They pause, they lobby for regulations that take years to navigate, and when the dust settles, they've locked in their advantage while everyone else is buried in compliance. It might be the most sophisticated regulatory capture play in history."

Whether the attempt is earnest or a play, the bold move is upending the AI policy landscape.

The last two years have seen significant AI legislative activity: thousands of bills introduced across 45 states and hundreds enacted, spanning deepfake bans, hiring disclosure, chatbot safety for minors, and transparency labels. No successful legislation has yet addressed the possibility that a frontier AI system might be too dangerous to build. The most ambitious attempt on this front, California's SB 1047, was vetoed by Governor Newsom after industry lobbying. Colorado's AI Act, the first comprehensive state law, has been delayed repeatedly and still isn't in effect. At the federal level, a Republican proposal attempted to ban states from regulating AI for ten years, though this was killed 99-1 in the Senate after a bipartisan revolt led by GOP governors.

On March 25, five days before the Anthropic pause announcement, Senator Bernie Sanders and Representative Alexandria Ocasio-Cortez introduced a simultaneous bill in both chambers seeking an immediate federal moratorium on the construction of new AI data centers and upgrading of existing ones, as well as export controls. This moratorium could only be lifted after comprehensive action by Congress. The move has been applauded by groups most concerned about AI development but derided by other policymakers, including on the left. Senator John Fetterman (D-PA) said, "I refuse to help hand the lead in AI to China," and Senator Mark Warner (D-VA) simply said "idiocy". The response rhymed with that of the White House in their AI framework released twelve days ago, which emphasized "winning the race" and a light-touch approach to AI regulation. It was also the White House whose memo nixed an attempted bill by Doug Fiefia (R-Utah) to require AI companies to publish safety and child-protection plans.

That was the landscape as of Sunday. Then a leading AI company, if not the leading AI company, put its money – at least a few hundred billion dollars of it – where its mouth is and said that no, AI really is that dangerous and drastic action is warranted.

A reasonable person might still disagree, but it is no longer reasonable to dismiss the AI-concerned position out of hand – not unless you can explain why Anthropic made this staggeringly costly move.

Sanders, who introduced the much-derided Data Center Moratorium five days earlier, said: "When a $380 billion company decides the danger is too great to continue, perhaps it's time to stop laughing at those of us who've been saying the same thing." Lawmakers are compelled, and the once-fringe bill has gained three new Senate cosponsors and five in the House. Modest numbers, but a notable increase from zero occurring in just the last 48 hours.

"You ought to hear them out" is the attitude sweeping through Washington as policymakers are scrambling to make sense of the development. Congressional hearings are expected with Anthropic leadership and other notable figures across the AI sector.

Anthropic's move will likely also provide cover against White House pressure to marginalized AI-concerned voices on the right, such as Utah Gov. Spencer Cox (R), Brendan Steinhauser, a former Republican strategist, and other state legislators like Doug Fiefia. Even more dramatic changes may be afoot when the House and Senate are likely to flip in the midterm elections.

The reaction isn't limited to the US: across the globe, there has been a flurry of reactions.

UN Secretary-General Guterres has called for the July Geneva Dialogue to be elevated to an emergency ministerial session, citing the Anthropic pause as impetus to further develop the creation of an AI equivalent to the International Atomic Energy Agency (IAEA), the main international body for nuclear non-proliferation.

The EU AI Office has announced an accelerated review of frontier model provisions contained in the EU AI Act and has invited Anthropic to brief the Independent Scientific Panel. The UK AI Security Institute, operational since 2024, has offered to independently verify Anthropic's safety concerns.

A joint statement was issued by five nations – the UK, France, Germany, Canada, and South Korea – calling for emergency negotiations on frontier AI safety to establish a binding international framework for frontier AI development, building on the Bletchley Declaration signed in 2023. The statement begins: "At Bletchley, twenty-eight nations agreed that frontier AI poses profound risks. That was a statement of concern. Today, one of the world's leading AI companies has put hundreds of billions of dollars behind that concern. It is time for the international community to match their courage with action."

And perhaps of greatest significance, China's foreign ministry has issued a carefully worded statement expressing "deep concern" about the risks identified by Anthropic and calling for "strengthened international cooperation on safe AI development under the framework of the United Nations". Skeptics might say that China would equally express this sentiment whether or not it intended to slow their own AI development, but it is consistent with China's posture at the UN debates last September. At the UN Security Council debate, the US was the sole dissenter against international coordination around AI, with OSTP Director Michael Kratsios explicitly rejecting centralized control and global governance of AI. China's sincerity is untested, but if the US reconsiders, it would appear that China is willing to come to the negotiating table.

Back at home, one can assume the competition has been celebrating. OpenAI CEO Sam Altman posted on X: "I commend Dario and Anthropic for acting in line with their conscience and best belief about what is best for humanity. We are committed to the same here at OpenAI. Fortunately, I have confidence in our people and approaches for creating AI beneficial for all humanity. If any Anthropic staff remain similarly hopeful, our doors are open – even for those who once left us."

A spokesperson for Google DeepMind said that while the company had not yet encountered anything to give them Anthropic's level of concern, they took the matter seriously and are in talks with Anthropic researchers to understand the risks that are informing the pause decision.

Elon Musk, head of xAI, simply posted: "Lol, you can trust grok."

Flippant responses aside, AI labs continuing to develop frontier AI must provide a compelling answer to the public, the government, and their employees for why they can do safely what Anthropic thinks it cannot.

Harder to track than the stock market and political bills is the reaction of the public. Already in mid-March, a Pew Research poll found that a majority of Americans are more concerned than excited by AI, and only 10% were more excited than concerned. How the Anthropic pause announcement affects this is unclear, but it is clear that the public started out more wary than most AI companies and the government.

Three days before Anthropic's announcement, "The AI Doc", a feature-length documentary exploring the question of AI dangers, hit domestic cinemas. The documentary film was directed by Daniel Roher, whose prior documentary, Navalny, won an Academy Award, and produced by the team that produced Everything Everywhere All At Once and Navalny. In contrast to those films, The AI Doc was initially a commercial flop, netting a mere $700k across four opening days. Since Monday's announcement, the documentary has seen a striking mid-week resurgence.

A spokesperson for the Machine Intelligence Research Institute (MIRI) confirmed that NYT bestseller, If Anyone Builds It, Everybody Dies, written by MIRI's Yudkowsky and Soares, has also seen a sudden surge in sales, months after the book was released.

The Anthropic announcement has gotten people's attention, and they are turning to the sources at hand for answers.

Perhaps the most gratified party of all since the announcement has been those who were calling for AI slowdowns all along. In fact, a mere eight days before the announcement, protestors assembled outside of Anthropic's headquarters in San Francisco. Protestors called for AI labs to commit to pausing on the condition that all other labs pause. Anthropic gave them better than that, unconditionally pausing.

Of course, not everyone is happy – especially not at home. Not everyone at Anthropic supports the decision.

Ben Gardner, an Anthropic engineer who is now seeking opportunities elsewhere: "AI is the most consequential technology in human history. I respect Dario and the other leaders immensely, but I can't bear to sit idly by while others develop this technology. That, to me, would be the ultimate in irresponsibility. I'm grateful for everything I have learned about AI and AI safety through my time there and amazing team, but I'm willing to put that experience to good use elsewhere if need be."

For another employee, the objection is less ideological. "I gave up multiple other opportunities to work at Anthropic. I moved location, and I lost my partner. To have it all dry up now? My role? My equity? I'm not going to lie. It hurts. It really fucking hurts."

Sources confirm that several employees are already interviewing at OpenAI and other labs.

For many employees we spoke to, though, the pain is real but accepted. "I'm not going to lie, the value of my equity evaporating feels shitty. I was set for life, I was set to be able to take care of my parents and ill sibling for life, and my kids. It's really quite devastating," said one employee on condition of anonymity. "When I first heard the news, I was angry – we have the world's best researchers and Claude to help us – surely we can solve whatever it is. But I think caution is right with technology this powerful. I will sleep well knowing we weren't irresponsible, we chose to do what's right, and if the fears are correct, well, you can't spend equity if you're dead."

Another employee shared: "I have elderly parents who are not well. I've been expecting Claude will grant them lasting health, and I fear any delays risk losing my parents forever. This isn't just about money. But I also have kids, and I think there are chances I'm not willing to take with their lives. This is hard, but I voted for it."

"Too dangerous to race."

Till now, the AI race has been framed as inevitable and unavoidable. If we don't do it, someone else will. The side of good will not win by sitting back and letting reckless and immoral actors take the lead.

Anthropic has decided to question that logic, and so far, it seems to be bearing fruit. Markets have reacted, politicians have mobilized, and the public is asking questions. It is too early to judge the ultimate effects of this move – perhaps the race will continue with just one less player – but it seems unlikely that discourse on AI will ever forget that an industry leader was willing to risk everything they had in the name of safety. 200 billion dollars is not a publicity stunt, it's one hell of an alarm – and the world is not sleeping through it.

Thank you to Claude Opus 4.6 for extensive help with the writing of this scenario. A few of the sentences (particularly quotes from figures) were directly Claude-written, but most of the text is not. Thank you to Inkhaven for the opportunity to write this.

This post fails to live up to the promise of its premise: it reads like wish-fulfillment fanfiction about what would happen if a replacement-level Less Wrong reader swapped bodies with Dario Amodei, rather than a serious extrapolation of what an "Anthropic pause" would actually look like.

The thing to understand about Anthropic is that—notably unlike Less Wrong readers—they believe in science. In "The Adolescence of Technology", Amodei says that "there's a decent chance we eventually reach a point where much more significant action is warranted, but that will depend on stronger evidence of imminent, concrete danger than we have today[.]" Anthropic's 2023 post on "Core Views on AI Safety" says that their motto has been "show, don't tell", and that "If future large models turn out to be very dangerous, it's essential we develop compelling evidence this is the case."

"Compelling evidence"! That's the difference between them and us. If these people pause, it's not going to be because it's "[a] hallmark of adulthood is the ability to wait." It'll be because something scared them—something specific scared them, such that they expect other ML specialists to be scared, too.

It would be tricky to figure out what to publish without empowering other actors to turn the dial up on the thing that scared them, but I do think we'd get something more substantial than anonymous leaks and an uninformative "not yet adequate for models at this capability level" statement, because any actual existential safety gained in such a scenario would come from convincing other actors that there's something to be afraid of, rather than from Anthropic signaling its moral seriousness.

If the news article about your pause announcement says "The glaring question is why?", then you have failed. You don't want people wondering why! If anyone learned anything from the OpenAI board fiasco, it should have been that you do have to produce the smoking gun if you want the world's power players to do something unusual. You can't just publish a vague statement that you're throwing the models out for being not consistently candid, because people won't believe you.

Obviously, if the threat is real and you can't find a smoking gun, then everyone dies. But it's more dignified to try not to die as Cassandra.

This is a good comment and the kind of discussion I hoped this post would start.

I find what you say pretty compelling that the nature of the pause isn't actually realistic to how Anthropic would behave.

To defend the piece though the goal wasn't to answer "conditioning on an Anthropic pause, what would that pause most likely look like?" and instead "conditioning on a pause, how would the world react?"

The intention is to respond to the claim that unilaterally pausing would be foolish as worse actors would simply move ahead. I think that's really far from obvious. A unilateral pause would have a huge impact. It could be debated whether that's better than winning, but I think it's crazy to speak as though a unilateral pause would do definitely do nothing. If this piece feels at all believable conditioning on a pause (e.g. what would happen if Dario acted like a LW-reader), then it's succeeded at my goal.

To accurately capture how the world would react, I think you do need to model what the pause looks like, including what evidence Anthropic provides for its decision. As written, the story is a bit jarring to me because Anthropic doesn't give any evidence with which to earn the world's positive reaction, but then everything goes bizarrely well. E.g. five high-influence countries instantly sign a joint statement. If this happened today, I would think Anthropic is making an irrational decision, and be ambivalent about the pause.

Even in the story, it's not clear the world is on track for meaningful global governance of the kind that would stop OAI/GDM though. My sense is that China mostly doesn't believe in x-risk and mainly wants to be freed from US export controls so it can compete fairly, and developing countries are optimistic about AI and would just be confused about Anthropic pausing. Meanwhile, safety research at Anthropic basically collapses because they can no longer afford compute, and it's unclear that OAI/GDM will voluntarily increase either their spending on safety or their ability to adopt Anthropic's recommended mitigations, much less pause voluntarily.

I think Anthropic could pause competently, but it would look very different. The key differences would be that they (a) exhaust internal fixes, and (b) put lots of effort into communicating that they can't build AI safely, nor can OpenAI or anyone else.

Here's how I think that would go. I'd find it super interesting for someone to build out into a story.

Suppose that Anthropic's best model (call it Agent-2.5) were dangerous and they believed the next one (Agent-3) would be even more dangerous. Anthropic previously believed that capability, propensity, and CoT monitoring/control all provide independent arguments that models are safe. So they would need evidence that two or all three fail at once.

Capability: Agent-2.5 is close to being capable of autonomous catastrophic harm. It's superhuman at every cyber eval they can find, and finds an endless list of exploits against their own codebase.
Propensity: Agent-2.5 has dangerous, coherent, misaligned goals
Control: Fails e.g. due to neuralese

Their safety researchers verify all of these. It's now clear that, according to their unless they fix propensity and control they shouldn't deploy Agent-2.5 or finish training Agent-3. They have a 2-month lead, which under RSPv3 Appendix A requires them to "delay AI development and deployment as needed" until they have a "strong argument that catastrophic risk is contained" or exhaust their lead; safety researchers have enough influence that this actually happens.

Now Anthropic is losing perhaps $2 billion/day of valuation, and under enormous pressure to fix propensity and control. Rather than stopping training runs immediately, they probably take the cheapest actions, like steering vectors, first. Then they try monitoring every Agent-2.5 with three Agent-2s, on the logic that quadrupling token cost is still better than not deploying the model at all, but it keeps breaking their neuralese translator model. In parallel with this, they might specifically train against the misaligned goals they observed in Agent-2.5, but find this doesn't generalize. Then they completely redo midtraining using a setup closer to Agent-2, and add to post-training three different experimental alignment datasets that have 1% capabilities penalty each, but the results are inconclusive because the model is eval aware and could be hiding a subtler form of dangerous, non-coherent goals. 3 weeks in, 40% of the company and growing is working on the misalignment crisis. They finish training a non-neuralese model, exclude cyberoffense from training data, and a few other things, which don't work either...

Now their lead is only 1 month. Everyone is itching to fix up and release one of the inconclusive checkpoints. It is probably very difficult to infer that another month of safety research won't be sufficient given the optimistic culture of Anthropic. But maybe Dario makes an executive decision that they won't just pause until they no longer have a lead, but pause unilaterally and indefinitely, and that they should announce this now for maximum pressure on other companies. Through back channels Anthropic learns that other labs also observe concerning misalignment in their models. Anthropic works with UKAISI, METR, etc to review their evidence, and makes a presentation at FMF to other frontier labs that their newest models are misaligned in a way that no technique known to Anthropic or UKAISI can address or even measure. This is coordinated with statements from UKAISI, MIRI, and others. Scary demos make it clear the model is one step from permanently escaping human control and is super misaligned.

Then Dario needs to speak in front of Congress alongside someone from UKAISI arguing that all companies should be forced to pause, ideally with Altman present too. They have some kind of draft policy proposal for governance. A week later he speaks in front of the UN with a convincing argument that the policy would be in the interest of other countries even if they don't quite believe in x-risk. With this kind of targeted generation of political will, it's at least on the table that governments take sensible actions instead of random anti-AI and safetywashing actions, and that companies will cooperate with them. As a bonus, maybe they get enough funding to keep the company's safety research alive.

I thought the post was useful for making me think "now that I'm reading this story about AI Pause, it just seems pretty implausible." Both the idea of Anthropic unilaterally pausing, and also the confused but somehow positive reaction from governments. It makes it feel more concrete that if we want a pause, we probably need to target a more specific path than "convince the labs that safety is important, so hard that one day they just stop."

@Thomas Kwa's story also feels a bit surprisingly positive, but more plausible than the original post.

The idea that Anthropic learns about misalignment in other labs' models seems especially helpful, since this gives them more confidence that other labs might be open to a pause. I wonder if we could set up a more reliable mechanism than "back channels"?

Ideally, labs would publicly and immediately disclose all worries of misalignment in their internal models. But since that probably won't happen, maybe there should be private channels. Like, the CEOs can press a button saying "I'm not concerned enough about misalignment right now to unilaterally pause, but I'm concerned enough that I'm open to talk about a pause with the other CEOs, if they feel the same way." And if two or more CEOs press this button, they get connected to each other. Or something smarter/more complicated than that.

It would be good for people to continue thinking this through more explicitly: what are some concrete scenarios in which METR/UKAISI/etc. contribute to a useful pause or slowdown? How likely are those scenarios?

Thanks for the great comment. In my mind, the situation is something like Anthropic is pausing but also hoping to resume eventually, and continues its general approach of saying less rather than more about strategy and what's going on internally. If that were the case, though, maybe they'd just not do further training runs internally without advertising this fact. One guess is preventing leaks is hard and safety-minded (quite senior) employees might be forcing action, though I admit the pause requires better motivation than the story provides.

Regarding communicable evidence: on the one hand, there are public statements about pausing going along with compelling shareable evidence, but on the other, I think historically it's just been a pretty secretive company. Many of my conversations with Anthropic employees hit "I can't talk about that." It feels hard to see that changing.

Even in the story, it's not clear the world is on track for meaningful global governance of the kind that would stop OAI/GDM though.

Completely. The point of the story was not to imply that the pausing results in everything working out great, merely that pausing would be a Highly Significant Event with Some Pretty Large Effects. Adequate legislation conditional on significant global legislation efforts feels like < 50% to me, at least.

Parts of the story that are now feeling implausible to me in the reaction are (a) the rapidity of reaction, especially the joint statement, which, if it did happen, would probably take much longer, (b) the conjunction of all those things happening – I think any of them individually or a subset is plausible.

My sense is that China mostly doesn't believe in x-risk and mainly wants to be freed from US export controls so it can compete fairly,

That seems plausible. Mostly, it seems no one is trying to talk to China and take it for granted that they'll want to compete and not take x-risk seriously enough to sign a treaty. I'm curious about your sources/evidence here.

Meanwhile, safety research at Anthropic basically collapses because they can no longer afford compute

I'm curious what the cost of ongoing safety research with existing models is if you're just doing inference. My guess is not nearly as much as the big training runs? Other variables are how long do their products stay frontrunners, and are they profitable vs subsidized by investment still at this point. But that'd be like 6-12 months max I'm guessing. I don't have the best models here.

April Fools alarm aside, on other alarms, I note that the contractual argument between the War Department and Anthropic was really about Dario’s discomfort with the fact that the “kill list” that has been the main US strategy in Iran was essentially written by Claude working with Palantir and then executed with minimal human oversight about who the human targets actually were and what the collateral damage actually was. This is extrapolating from the head of Amnesty Tech’s (Amnesty International’s anti-AI murder arm) extensive interview on Al Jazeera about two weeks ago; we should hang out with that guy, he seems great. Dario didn’t break up with the government because of what they might do in the future; he got out of the contract because he didn’t want any more blood on his and his AI’s hands than they already have. And it’s a lot of blood. It appears that Sam Altman doesn’t give a shit and is happy to have OpenAI lead strategy and the generation of kill lists for future wars.

Separately, autonomous drones with lethal authority (ostensibly to get around Russian electronic countermeasures and jamming) have been known to be operating for at least a couple weeks in the Ukraine theater and likely in testing for much longer. They would certainly be deployed by both sides in a US / China WW3.

Giant laser beams and microwave guns are now being used in warfare. Russia is threatening to launch nuclear missiles permanently in earth orbit. China’s space weapons may have leapt ahead of US space weapons over the course of the last year.

All of the pieces are adding up for the classic Skynet vision but evil or unknowingly morally ambiguous singleton probably more likely than that. I remain optimistic but only for reasons the rationalists consider a joke. They’ll come around to my views eventually if we live though. Basically aliens/NHI/external timeline muckery.

Aside; the lack of mention of the Anthropic/DoW conflict is a hole in the post.

So all the disagreement downvotes are because I am sticking to my talking point that the rationalists are wrong about how to make decisions that around situations where there is only circumstantial and not scientific evidence, right? No one is disputing the facts of the first three paragraphs, right?

I can't tell whether this was very well done or if I'm just very gullible

Posting an April Fool's prank timestamped as April 2nd? Shame on you. Extra strong downvote.

Please sir, mercy. I'm in Pacific Time and it was 11pm here!

Ah, I had assumed that the displayed timestamp was based on the poster's time, not the reader's. I retract my extra-strong downvote.

I suggest to label April Fool's pranks in text as AI can use it as a source in its training and get the date wrong.

The posted timestamp shows local time, so this showed up to me as if posted on the 2nd of April. I actually thought it was genuine until I got to the "A joint statement was issued by five nations..." section, lol. I might learn about such an event on LW for the first time, but by the time a joint statement could be coordinated I would have been flooded about this from other sources.

Sarah Chen of Bernstein Research

Feature request: automatically flagging especially silly llm-isms like the name Sarah Chen

Look, Sarah Chen is anthropically a very plausible name to encounter.

Well, your post has proven disturbingly prescient, given this is what appeared on a New York Times column yesterday.