This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
is fundraising!
Tags
LW
$
Login
AI Evaluations
•
Applied to
Ablations for “Frontier Models are Capable of In-context Scheming”
by
AlexMeinke
2h
ago
•
Applied to
Ideas for benchmarking LLM creativity
by
gwern
2d
ago
•
Applied to
Frontier Models are Capable of In-context Scheming
by
Marius Hobbhahn
12d
ago
•
Applied to
How to make evals for the AISI evals bounty
by
TheManxLoiner
15d
ago
•
Applied to
Call for evaluators: Participate in the European AI Office workshop on general-purpose AI models and systemic risks
by
Tom DAVID
21d
ago
•
Applied to
Which evals resources would be good?
by
Marius Hobbhahn
1mo
ago
•
Applied to
The Evals Gap
by
Marius Hobbhahn
1mo
ago
•
Applied to
Agency overhang as a proxy for Sharp left turn
by
Eris
1mo
ago
•
Applied to
A Poem Is All You Need: Jailbreaking ChatGPT, Meta & More
by
Sharat Jacob Jacob
2mo
ago
•
Applied to
UK AISI: Early lessons from evaluating frontier AI systems
by
Ruby
2mo
ago
•
Applied to
BIG-Bench Canary Contamination in GPT-4
by
Jozdien
2mo
ago
•
Applied to
Sabotage Evaluations for Frontier Models
by
David Duvenaud
2mo
ago
•
Applied to
Improving Model-Written Evals for AI Safety Benchmarking
by
Sunishchal Dev
2mo
ago
•
Applied to
An Opinionated Evals Reading List
by
Marius Hobbhahn
2mo
ago
•
Applied to
Biasing VLM Response with Visual Stimuli
by
Jaehyuk Lim
2mo
ago
•
Applied to
LLM Psychometrics and Prompt-Induced Psychopathy
by
Korbinian K.
3mo
ago
•
Applied to
New Capabilities, New Risks? - Evaluating Agentic General Assistants using Elements of GAIA & METR Frameworks
by
Raemon
3mo
ago