FWIW here's the feedback that I provided:
(basically steel-manning the case for being skeptical) - I haven't followed the Mythos/Glasswing discussion closely, so this is mostly just 1st principles)
I'm guessing that most skeptics are just operating on quite reasonable priors that companies generally don't economically disadvantage themselves (e.g. by willingly avoiding releasing a product that would (apparently) earn them more money)
When a person has this prior, they have two scenarios to consider:
1. Anthropic is a significant outlier among tech companies and is indeed choosing - for purely ethical reasons - to miss out on revenue gained from releasing a better product during a heated race with >$100B at stake
2. Anthropic is a normal company and is taking this action for commercial reasons. While there might be some plausible ethical justification for the decision to not release this product, their comms / PR / partnership strategy is heavily informed by commercial motivations and is having an undesirable effect (hysteria / hype)
To paint a different scenario:
If Anthropic put out very little communications themselves, and the same information was announced by UK AISI and other partners, then you would imagine that this criticism would have been negated
I think skeptics implicitly know that Anthropic could have done things much more quietly.
To substantiate this intuition, they could have easily:
1) let the UK AISI do their job by releasing their cyber evaluation report like they do for any other model
2) discreetly formed testing & access partnerships, without doing a PR campaign
Link to UK AISI report:
https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos-previews-cyber-capabilities
Additionally, Anthropic probably could have foreseen this criticism, and yet they chose to take this approach anyway.
A skeptic needs to have an explanation for why Anthropic decided to put together this very glossy website that looks like a product launch: https://www.anthropic.com/glasswing
To anyone who has a more conservative prior on Anthropic's ethical motivations, the Occam's razor conclusion is that this is a marketing stunt to gain some commercial advantage
The argument that EA / AI safety people commonly make is mostly predicated on Anthropic being an outlier company, which itself requires significantly more evidence than the base case that they are a normal company
Many of us within the space feel like we've seen enough evidence to suggest that Anthropic might be an outlier company. Actions like the DoW incident seem to support this.
Over the past decade or so, "Most people's world models contain ~0 gears" has become an increasingly load-bearing premise in my understanding of humanity. Most people, including the ones publicly writing articles and posts on AI, have so little understanding of the subject matter that how they react to new information is nearly uncorrelated with what the information means to someone who does understand it. And unfortunately, in AI, the reality is often such that most people's reaction to a more-correct interpretation is going to be disbelief unless they have a lot of trust in the judgment of the person expressing it or are willing to spend a lot of time learning about the topic.
I agree with @Justin Olive that "Mythos claims are fake/exaggerated/hype/marketing" is a reasonable default on purely social priors if you know nothing about AI or cybersecurity, and nothing about Anthropic in particular, and simply ignore all the technical information.
FWIW in my experience the most effective counter, at least one-on-one, has been to explain to/remind people that Anthropic has been growing almost 50%/month lately, from $9B in December to $30B in April , and that it is supply constrained on a software product, which I don't think has ever happened before AFAIK.
If anthropic exaggerated Mythos' cyber capabilities, that's on them and a mistake by both profit and safety motives. If they didn't, it's hard to see how they could've acted much better without spending a bunch of time with little return, or allowing leaks to spin rumors out of control, possibly resulting in the same accusations of clever leak based marketing hype in the absence of a statement.
I do think it might've been smarter to lean a little harder on the "seems maybe pretty dangerous so just in case" narrative.
In either case, time will tell - although there's going to be a Y2K effect where using it to patch vulnerabilities reduces evidence that there was ever a danger.
Hopefully that surfaces in the public debate. They could've gotten a lot more hype by releasing it in a way that allowed hackers to demonstrate its capabilities.
I have the sense that you need to announce it somehow.
Why? Project Glasswing was organized semi-privately, its declaration came last. Why couldn't the work simply proceed with less fanfare?
Well, it does feel useful to make public claims about the state of dangerous capabilities so that people can orient accordingly. But I'm trying to make the point here that, if you make these public claims, you should do it through independent third parties who provide irrefutable evidence.
I think you should specify which subgroup of the 'public' is important to keep informed and credulous of claims of cyber risks.
For brevity, I would like to provide a highly uncharitable characterization of the circumstances:
I assume there is some obvious flaw with this framing, and hope you may helpfully point it out.
But if it happens to be accurate, then I feel it would reflect a gross overestimation of the causal impact of Anthropic's comms on disbelief, of the ability of people to judge ostensibly irrefutable evidence of model capabilities, and of the political relevance of the particular sphere of people who mislabelled Mythos as a fundraiser.
I have seen a lot of coverage suggesting that Claude’s new model, Mythos, is a vehicle for Anthropic to peddle hype and doom in order to raise money. While some of this is necessarily motivated by people’s unwillingness to stare into the abyss of our AI future, the breadth of otherwise-reasonable people who have voiced these kinds of cynical opinions suggests that part of the blame rests on Anthropic.
In this post, I want to briefly unpack the way people misinterpreted the evidence, their valid reasons for doing so, and what Anthropic (and the AI safety community more broadly) should learn from this.
In particular, since we will inevitably see other dangerous capabilities spontaneously emerge in the future, we need protocols for how to announce them effectively. To this end, I try to make the following points:
1. Mythos criticisms & missing the point
In one podcast from The Guardian, the reporter says that “this Mythos debacle [has led to] heads of state saying ‘this is so dangerous, it could shred our infrastructure, the end of civilization is nigh!’”. She then says that “accepting the companies’ premise that they are creating a machine god” is helping these companies sell their product.
In a different podcast, Cal Newport (a professor of computer science at Georgetown University!) concludes his analysis of Mythos by saying “it was wrong for Mythos to get the amount of dread-coverage that it got; so far we do not have evidence that it represents a significantly larger leap in detecting or exploiting vulnerabilities than we’ve seen in previous releases.” He has since gone on other podcasts to make these points.
Similarly, the YouTube channel Internet of Bugs posted a video titled “Anthropic’s $x00 Million Marketing Stunt” saying that “we’ve been seeing a lot of this particular attention-grabbing technique lately. I’m sure it’s going to only be getting worse as the AI companies get more desperate to keep up the flow of investment dollars. How are we supposed to believe this shit?”
To be very clear, the point is not only that Anthropic have a model which is good at cybersecurity tasks. The point is that scaling laws are holding and that the inevitable acceleration continues. With each new model, we unlock new and mysterious risks which we have to grapple with. In this case, it was cybersecurity, but it could have realistically been anything. This bigger story was essentially lost amid the hoopla.
2. It feels like people wanted to miss the point?
I understand the instinct to say a company is hype-mongering when it says it has a big scary thing. E.g., Sam Altman’s tweet of the death star before GPT-5. But I am surprised at how much people are willing to focus on evidence to support their prior that Anthropic is just engaging in corporate shenanigans.
For example, there’s this post from an LLMs-for-cybersecurity org saying that other, smaller models were able to find the bugs that Mythos found. They write “we took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis.” This was retweeted by the HuggingFace CEO and, subsequently, used as evidence by all three of the above podcasts/videos.
Am I going crazy? Isn’t this pretty bad evidence towards dismissing all the claims Anthropic is making? As said by others, it’s the equivalent of isolating the clump of haystack with a needle in it, giving this clump to a small child, and then saying “wow they were able to find the needle too”. The point is that locating that clump is the hard part!
3. There must be a lesson here.
If we ignore the whole “accelerating us towards the machine god” thing, I think that Anthropic behaved responsibly with Mythos. I also think there are lessons to be learned regarding the publicity, as evidenced by the fact that reasonable people consistently missed the point.
Anthropic is in the business of making extremely unpopular things which they claim could ruin society as we know it. They also repeatedly say it is too dangerous for the public to have access to these things (I note that caution seems generally warranted).
It is reasonable for people to think Anthropic is crying wolf, especially given that they are on track to have the highest IPO of all time and got there, in part, via prophecies.
Unfortunately, I am in the camp of people who think the prophecies are true. This means that, for me, evidence which supports Anthropic’s worldview exists in superposition: it is both hype and responsible, both doomer and appropriate.
But as we’ve seen, it is difficult to convince other people that this is all really happening. They will (with good reason) consider this self-serving and they will (with good reason) not want to confront reality. If people perceive us as crying wolf, they will grow weary of our frenetic anxieties and our cause will go the way of pandemic-preparedness.
Consequently, I think of AI safety as operating with some amount of don’t-be-annoying capital[1]. This is the amount of sympathy the general public has towards our concerns. It is amassed when AI causes things to go wrong (in the public’s consciousness, not in ours). It is expended when we make claims that people don’t want to believe. It is also expended when AI companies make claims which are plausibly self-serving. It is expended even faster when these claims are poorly substantiated.
I argue we should frame public interactions around trying to grow this resource.
4. Some options for next time
Let’s put ourselves in Anthropic’s shoes: you just made your latest digital nuke. You have to do something with it. What are your options?
I have the sense that you need to announce it somehow. But, if you announce it, you likely expend capital. If the thing you announce does not live up to the hype, you expend even more capital. And we’ve seen that reasonable people will look for unreasonable excuses to dismiss your claims, draining your capital further.
To this end, Anthropic should have done more due diligence with the Mythos release: their model card (while thorough) did not have the rigor of a comprehensive scientific study. The cybersecurity assessment takes up only 6 pages of the otherwise 200+ page-long system card (pages 46-52); it includes four experiments where they only compare to Anthropic-line models. The blog post goes into more detail about how they found zero-days but again misses some due diligences. For example, they could have evaluated other, non-Anthropic models across the capabilities spectrum to verify that Mythos is uniquely able to do these tasks. They also could have run some controls. Extraordinary claims require extraordinary evidence.
I also believe Anthropic should be more up-front about their conflict of interest with regards to making statements of doom. For as good as I believe their intentions here to be, this conflict of interest is real and people are right to perceive it. One simple option to avoid these perceptions would be to prioritize the analysis coming from independent non-profit evaluators. Credit where it’s due: Anthropic did have the UK AISI provide support for Mythos's capabilities. Across the cynical takes I’ve seen, this analysis was treated with more respect. Of course, even this runs into cynicism, since people will start to think the non-profits are in cahoots with the companies, as evidenced by the comments on the recent NYT profile of METR:
Finally, although random dangerous capabilities will emerge, the point is not any one specific capability. I think letting the narrative get oriented towards the cybersecurity elements of Mythos did a disservice to the public: I doubt most people internalized how big the overall capabilities jump here was, nor that the next such jump will bring new harms into view.
I’m also sure there are considerations which I am not privy to, and recognize that it’s easy to criticize from my cozy corner where nothing I do moves the stock market.
Nonetheless, Anthropic’s first-mover advantage on identifying dangerous capabilities also endows them with a first-doomer responsibility.
thanks to Erin, Steven, Justin, Joseph and Li-lian for comments.
[edit log]: changed some wording and added a sentence about the blog post from the red team. Also changed the title to be more in line with what I was trying to say; old title was "Anthropic spent too much don't-be-annoying capital on Mythos" but, in retrospect, this isn't fully representative of my view.
I call it 'don't-be-annoying capital' because that's the lived experience on the receiving end and I think it is good to think about this from the perspective of the audience. I'll admit that something like 'warning fatigue' might be a more representative name.