LESSWRONG
LW

Hannah Betts

Posts

Sorted by New

29Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google

2mo

0

Wikitag Contributions

Comments

Sorted by