Cross-posting from a Twitter thread responding to a recent viral comments by @Richard_Ngo about EA, Anthropic, and AI safety as a 'fake field.' Posting here because I expect this to be quite unpopular on LW.
(original thread: https://x.com/CRSegerie/status/2056737155880493357)
AI safety in 2023–2026 was driven by evals, threat models, scary demos, model-organism work, RSPs, and voluntary commitments. Richard calls this "much more of a fake field" and says it "won't generalize".
Here's why I disagree - 1/10
1/ I agree with Anthropic being now the biggest lever. They lead the AGI race, and Mythos moved the White House; this is quite a feat! But many of the specifics are wildly overstated
2/ Not a blind spot.
Empowering safety-conscious actors at the frontier was openly debated on the forum for years. Calling a deliberate/contested strategy a "blind spot" rewrites history. The bet was visible and explicit.
Personally, I've publicly criticized Anthropic on a few topics, but I still think the field is in a much better position, given that they're leading compared to the shady behavior at OpenAI.
3 /The effect of Anthropic leading is not just "AGI faster"
Anthropic has many positive externalities:
* Dario has been more candid than most CEOs about risks in public (even if he could still go a lot further)
* They are doing top-tier research and implementing SOTA mitigations
I don't know what I would have done with Mythos at their place. In the past, when I've discussed this with people at Anthropic, I've often updated on the difficulty of being in the driver's seat. I might be wrong, but I don't think it would be easy to improve Anthropic's behavior qualitatively in a game-changing way (even if many substantial improvements are on the table).
4/ Anthropic visibly moved US executive posture, Senate hearings, frontier-lab norms, and the public conversation toward taking the risks seriously.
Yes, they relinquished their RSPv2, and we no longer have the guarant