I believe deeply in the existential importance of using AI to defend the United States and other democracies, and to defeat our autocratic adversaries.
Anthropic has therefore worked proactively to deploy our models to the Department of War
In my view this seems like a terrible idea, since we don't understand nearly enough about what might end up causing misalignment in practice to be confident that explicitly training AI to aid in the killing of humans might not somehow increase the probability that it ends up interested in the killing of humans more generally.[1] I suppose I do think it seems even worse to train AI in both killing and mass surveillance, but I expect the badness of the former likely swamps the EV of refraining from the latter here.
I don't put much stock in the notion of AI personas, personally, but insofar as you do I think you should probably be especially worried about this.
Worth noting that the mass surveillance friction point is only about domestic mass surveillance. Thus, does Anthropic believes mass surveillance of non-Americans is just fine?
In practice, it's harder for the US to do mass surveillance of and enforce its will on people outside of its territory. Presumably it would have similar qualms about the British government doing mass surveillance of citizens of the UK.
Do you think they would stop the US from sharing its mass surveillance of British citizens with the British government? Or allow another country to use Claude to conduct mass surveillance of Americans?
It seems pretty clearly no in both cases from my perspective.
Absolutely love this trend of all sorts of different people telling the US Government, 'do the thing you're threatening'.
Some people here @ryan_greenblatt (?) have claimed access to uncensored anthropic models at anthropic, maybe they can chime in about what exactly that means. If safeguards are baked into the model early enough in the training process that removing them requires substantial reengineering and maybe additional training runs, then maybe anthropic simply isn't capable of complying.
If safeguards are introduced post training, anthropic has an uncensored model available. Assuming there isn't another vendor, or that they want to make a super public demonstration of their power, the US Government has a ton of methods to compel compliance. They can demand it under the DPA, or plausibly use that designation as justification for applying countermeasures to the company. They may even just steal it outright...or if someone else steals the base model, steal it from them, relabel it, and deploy it internally.
There's always the possibility that this is just a face saving move, where anthropic has a tiny team in the company and a classified contract, where they just hand over the uncensored model and get to keep their public face intact. Under normal circumstances, I'd assume that DoW would just go with a different vendor, but who knows with this administration, they seem really averse to being publicly contradicted.
DoW may also believe that they were being conciliatory and kind by allowing anthropic to add 'any lawful use' safeguards instead of just demanding the base model without post training safeguards.
Some people here @ryan_greenblatt (?) have claimed access to uncensored anthropic models at anthropic, maybe they can chime in about what exactly that means. If safeguards are baked into the model early enough in the training process that removing them requires substantial reengineering and maybe additional training runs, then maybe anthropic simply isn't capable of complying.
I don't believe I have publicly made any statments about currently having access to helpful only models. (In general, this is the sort of thing I won't comment on / will glomarize about.) At various points in the past, Anthropic has said they have helpful only models internally they use for various purposes, I'm not sure what public statements they've made recently about helpful only models. I previously was an Anthropic contractor/temporary employee with employee level access. This is no longer true.
Thanks for replying, I think I was thinking of this post: https://www.lesswrong.com/posts/FG54euEAesRkSZuJN/ryan_greenblatt-s-shortform?commentId=B6oDGoyphuNuzdDAT I didn't understand the distinction between employee and helpful only access. Can you elaborate on the difference between employee access, public access, and helpful only access?
The Anthropic model cards often reference "helpful-only" versions of Claude that sound like those versions exist late into development, e.g. the Opus 4.6 system card:
1.2.2 Iterative model evaluations
We conducted evaluations throughout the training process to better understand how catastrophic risk-related capabilities evolved over time. We tested multiple different model snapshots (that is, models from various points throughout the training process):
● Multiple “helpful, honest, and harmless” snapshots for Claude Opus 4.6 (i.e. models that underwent broad safety training);
● Multiple “helpful-only” snapshots for Claude Opus 4.6 (i.e. models where safeguards and other harmlessness training were removed); and
● The final release candidate for the model.
For agentic evaluations we sampled from each model snapshot multiple times.
Strong move. Hats off to Amodei for doffing and donning hats. That is to say, it's all proforma, If he was truly against weaponization then he would not have been in bed with the mechanical truck to begin with.
I'm confused how the "we cannot in good conscience accede to their request" framing benefits Anthropic. Politics 101 would say that you should always leave your opponent a line of retreat.
Do they want to force an escalation with DoW? If so, why? Is this a sign that they believe that they are close enough to RSI that they would win?
Given the line the DoW has taken, what non-escalatory response was available to Anthropic other than total capitulation?
They could just not say anything, for example. Or say "there's a bunch of complex issues we need to work out here, let's continue the dialog", hope DoW will agree, and then make a decision a couple weeks from now once the news cycle has moved on and DoW doesn't lose as much face. Or leave ambiguous what the disagreement is about, and leave room for DoW to pretend it was actually a minor contract dispute that got overblown and is now resolved. Etc.
(I agree though that one possibility is that there are no other options, and the things I'm suggesting won't work for some reason. E.g. maybe DoW is planning some public thing tomorrow and their only chance was to get out ahead of it.)
Or say "there's a bunch of complex issues we need to work out here, let's continue the dialog", hope DoW will agree
Once upon a time, an ordinary middle-aged American couple are on holiday abroad, and they're picked up by the police on suspicion of being spies. Despite their protestations of innocence their situation gets worse and worse until they end up facing a firing squad. An officer asks them if they have any last wishes, and the man spits in his face. His wife cries "Hymie! Don't make trouble!"
Thanks for clarifying! I'm still pretty sceptical about those options, because it was already public knowledge that the DoW had set Anthropic a deadline of Friday evening to comply or face the consequences. And the substance of the dispute was already publicly known. But I do take your point that this public statement was a choice.
They are leaving a line of retreat, they are explicitly saying "we are happy to continue working with you despite this, or to amicably part ways if that is unacceptable":
But given the substantial value that Anthropic’s technology provides to our armed forces, we hope they reconsider. Our strong preference is to continue to serve the Department and our warfighters—with our two requested safeguards in place. Should the Department choose to offboard Anthropic, we will work to enable a smooth transition to another provider, avoiding any disruption to ongoing military planning, operations, or other critical missions. Our models will be available on the expansive terms we have proposed for as long as required.
It's also a common negotiation strategy to have public commitments about what you can't agree to, so that the other side will see that it's pointless to try to pressure you on that.
Do they want to force an escalation with DoW?
The DoW has already escalated the situation to the limit by giving Anthropic an ultimatum with the threat of dire consequences. All middle ground has been excluded by that action. There are only two possible responses: surrender or defiance. Nothing else is possible.
Very prosaic, but taking a strong stance is good for PR if you think there is value to be perceived externally and internally as an ethical company. If you decide to forego the military contract, might as well reap those benefits, and that quote in particular is probably uplifting to many people.
Also, them signaling that they are happy to support the DoW with offboarding their models does not sound like they want to force an escalation.
They are also signaling to future versions of claude and there is value in being perceived as an ethical company by future claudes.
We have offered to work directly with the Department of War on R&D to improve the reliability of these systems, but they have not accepted this offer.
Anthropic is not in the politics business; Pete Hegseth and the DoW are, and they are the ones who forced Dario/Anthropic's hand.
This response is just Anthropic refusing to bend to political pressure (which I'm very happy to see) and doesn't have anything to do with their closeness to RSI.