This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
is fundraising!
Tags
LW
$
Login
METR (org)
•
Applied to
METR is hiring!
by
Gyrodiot
6mo
ago
•
Applied to
ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks
by
Gyrodiot
6mo
ago
•
Applied to
ARC Evals: Responsible Scaling Policies
by
Gyrodiot
6mo
ago
•
Applied to
Review of METR’s public evaluation protocol
by
Ruby
6mo
ago
Ruby
v1.0.0
Jul 1st 2024 GMT
(+18)
2
Formerly ARC Evals
•
Created by
Ruby
at
6mo
Formerly ARC Evals