This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Anthropic (org)
•
Applied to
Anthropic AI made the right call
by
bhauth
12d
ago
•
Applied to
OMMC Announces RIP
by
Adam Scholl
25d
ago
•
Applied to
Vaniver's thoughts on Anthropic's RSP
by
Gunnar_Zarncke
3mo
ago
•
Applied to
Introducing Alignment Stress-Testing at Anthropic
by
Gunnar_Zarncke
3mo
ago
•
Applied to
On Anthropic’s Sleeper Agents Paper
by
Gunnar_Zarncke
3mo
ago
•
Applied to
Scalable And Transferable Black-Box Jailbreaks For Language Models Via Persona Modulation
by
Soroush Pour
6mo
ago
•
Applied to
Dario Amodei’s prepared remarks from the UK AI Safety Summit, on Anthropic’s Responsible Scaling Policy
by
Zac Hatfield-Dodds
6mo
ago
•
Applied to
Comparing Anthropic's Dictionary Learning to Ours
by
Robert_AIZI
7mo
ago
•
Applied to
Measuring and Improving the Faithfulness of Model-Generated Reasoning
by
HenningB
7mo
ago
•
Applied to
Anthropic's Responsible Scaling Policy & Long-Term Benefit Trust
by
Zac Hatfield-Dodds
7mo
ago
•
Applied to
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
by
Zac Hatfield-Dodds
7mo
ago
•
Applied to
Amazon to invest up to $4 billion in Anthropic
by
RobertM
7mo
ago
•
Applied to
AI Awareness through Interaction with Blatantly Alien Models
by
VojtaKovarik
9mo
ago
•
Applied to
Frontier Model Forum
by
RobertM
9mo
ago
•
Applied to
Frontier Model Security
by
RobertM
9mo
ago
•
Applied to
Anthropic Observations
by
RobertM
9mo
ago
•
Applied to
Anthropic | Charting a Path to AI Accountability
by
Gabriel Mukobi
10mo
ago
•
Applied to
Rishi Sunak mentions "existential threats" in talk with OpenAI, DeepMind, Anthropic CEOs
by
Baldassare Castiglione
1y
ago