x

LESSWRONG
LW

Kellin Pelrine — LessWrong

Kellin Pelrine

164000

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

No wikitag contributions to display.

No Comments Found

6Investigating Accidental Misalignment: Causal Effects of Fine-Tuning Data on Model Vulnerability

8mo

0

37Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google

1y

0

18GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning

1y

0

130Even Superhuman Go AIs Have Surprising Failure Modes

3y

22