This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
AI Evaluations
•
Applied to
AI Safety Institute's Inspect hello world example for AI evals
by
TheManxLoiner
3d
ago
•
Applied to
Mechanistically Eliciting Latent Behaviors in Language Models
by
TurnTrout
19d
ago
•
Applied to
An Introduction to AI Sandbagging
by
Teun van der Weij
23d
ago
•
Applied to
Inducing Unprompted Misalignment in LLMs
by
Sam Svenningsen
1mo
ago
•
Applied to
LLM Evaluators Recognize and Favor Their Own Generations
by
Arjun Panickssery
1mo
ago
•
Applied to
Claude wants to be conscious
by
Joe Kwon
1mo
ago
•
Applied to
Measuring Predictability of Persona Evaluations
by
Thee Ho
1mo
ago
•
Applied to
Run evals on base models too!
by
orthonormal
1mo
ago
•
Applied to
OMMC Announces RIP
by
Adam Scholl
2mo
ago
•
Applied to
Third-party testing as a key ingredient of AI policy
by
jacobjacob
2mo
ago
•
Applied to
DeepMind: Evaluating Frontier Models for Dangerous Capabilities
by
Ruby
2mo
ago
•
Applied to
AI Safety Evaluations: A Regulatory Review
by
Elliot_Mckernon
2mo
ago
•
Applied to
Introducing METR's Autonomy Evaluation Resources
by
Megan Kinniment
2mo
ago
•
Applied to
Protocol evaluations: good analogies vs control
by
Charbel-Raphaël
3mo
ago
•
Applied to
Self-Awareness: Taxonomy and eval suite proposal
by
RobertM
3mo
ago
•
Applied to
Critiques of the AI control agenda
by
Jozdien
3mo
ago
•
Applied to
Skepticism About DeepMind's "Grandmaster-Level" Chess Without Search
by
Arjun Panickssery
3mo
ago
•
Applied to
Soft Prompts for Evaluation: Measuring Conditional Distance of Capabilities
by
porby
4mo
ago
•
Applied to
The case for more ambitious language model evals
by
Jozdien
4mo
ago