This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
is fundraising!
Tags
LW
$
Login
AI Benchmarking
•
Applied to
Workshop Report: Why current benchmarks approaches are not sufficient for safety?
by
Tom DAVID
22d
ago
•
Applied to
Improving Model-Written Evals for AI Safety Benchmarking
by
Sunishchal Dev
2mo
ago
•
Applied to
Auto-Enhance: Developing a meta-benchmark to measure LLM agents’ ability to improve other agents
by
Sam F. Brown
5mo
ago
•
Applied to
LLM Psychometrics: A Speculative Approach to AI Safety
by
pskl
11mo
ago
•
Applied to
Introducing REBUS: A Robust Evaluation Benchmark of Understanding Symbols
by
Arjun Panickssery
1y
ago
•
Applied to
MMLU’s Moral Scenarios Benchmark Doesn’t Measure What You Think it Measures
by
corey morris
1y
ago
•
Applied to
Broken Benchmark: MMLU
by
awg
1y
ago
•
Created by
rybolos
at
1y