This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
AI Benchmarking
•
Applied to
Improving Model-Written Evals for AI Safety Benchmarking
by
Sunishchal Dev
1mo
ago
•
Applied to
Auto-Enhance: Developing a meta-benchmark to measure LLM agents’ ability to improve other agents
by
Sam F. Brown
4mo
ago
•
Applied to
LLM Psychometrics: A Speculative Approach to AI Safety
by
pskl
10mo
ago
•
Applied to
Introducing REBUS: A Robust Evaluation Benchmark of Understanding Symbols
by
Arjun Panickssery
10mo
ago
•
Applied to
MMLU’s Moral Scenarios Benchmark Doesn’t Measure What You Think it Measures
by
corey morris
1y
ago
•
Applied to
Broken Benchmark: MMLU
by
awg
1y
ago
•
Created by
rybolos
at
1y