x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
METR (org) — LessWrong
You are viewing version 1.0.0 of this page. Click here to view the latest version.
METR (org)
Edited by
Ruby
last updated
1st Jul 2024
You are viewing revision 1.0.0, last edited by
Ruby
Formerly ARC Evals
Subscribe
Discussion
Subscribe
Discussion
Posts tagged
METR (org)
Most Relevant
2
100
METR's Observations of Reward Hacking in Recent Frontier Models
Daniel Kokotajlo
10mo
9
2
97
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
habryka
9mo
43
2
21
AXRP Episode 47 - David Rein on METR Time Horizons
Ω
DanielFilan
3mo
Ω
0
2
10
Review of METR’s public evaluation protocol
nahoj
,
JaimeRV
2y
0
1
242
METR: Measuring AI Ability to Complete Long Tasks
Ω
Zach Stein-Perlman
1y
Ω
106
1
153
ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks
Ω
Beth Barnes
3y
Ω
12
1
145
METR's Evaluation of GPT-5
Ω
GradientDissenter
8mo
Ω
15
1
108
Clarifying METR's Auditing Role
Ω
Beth Barnes
2y
Ω
1
1
90
Introducing METR's Autonomy Evaluation Resources
Megan Kinniment
,
Beth Barnes
2y
0
1
70
Interpreting the METR Time Horizons Post
Ω
snewman
1y
Ω
13
1
67
Reactions to METR task length paper are insane
Cole Wyeth
1y
43
1
65
METR is hiring!
Beth Barnes
2y
1
1
64
CoT May Be Highly Informative Despite “Unfaithfulness” [METR]
Ω
GradientDissenter
8mo
Ω
3
1
40
ARC Evals: Responsible Scaling Policies
Zach Stein-Perlman
3y
10
1
40
Is METR Underestimating LLM Time Horizons?
andreasrobinson
3mo
6