x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
METR (org) — LessWrong
You are viewing version 1.0.0 of this page. Click here to view the latest version.
METR (org)
Edited by
Ruby
last updated
1st Jul 2024
You are viewing revision 1.0.0, last edited by
Ruby
Formerly ARC Evals
Subscribe
Discussion
Subscribe
Discussion
Posts tagged
METR (org)
Most Relevant
2
100
METR's Observations of Reward Hacking in Recent Frontier Models
Daniel Kokotajlo
8mo
9
2
97
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
habryka
7mo
43
2
21
AXRP Episode 47 - David Rein on METR Time Horizons
Ω
DanielFilan
1mo
Ω
0
2
10
Review of METR’s public evaluation protocol
nahoj
,
JaimeRV
2y
0
1
242
METR: Measuring AI Ability to Complete Long Tasks
Ω
Zach Stein-Perlman
10mo
Ω
106
1
153
ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks
Ω
Beth Barnes
3y
Ω
12
1
145
METR's Evaluation of GPT-5
Ω
GradientDissenter
6mo
Ω
15
1
108
Clarifying METR's Auditing Role
Ω
Beth Barnes
2y
Ω
1
1
90
Introducing METR's Autonomy Evaluation Resources
Megan Kinniment
,
Beth Barnes
2y
0
1
70
Interpreting the METR Time Horizons Post
Ω
snewman
10mo
Ω
13
1
67
Reactions to METR task length paper are insane
Cole Wyeth
10mo
43
1
65
METR is hiring!
Beth Barnes
2y
1
1
64
CoT May Be Highly Informative Despite “Unfaithfulness” [METR]
Ω
GradientDissenter
6mo
Ω
3
1
40
ARC Evals: Responsible Scaling Policies
Zach Stein-Perlman
2y
10
1
40
Is METR Underestimating LLM Time Horizons?
andreasrobinson
1mo
6