x

LESSWRONG
LW

METR (org) — LessWrong

You are viewing version 1.0.0 of this page. Click here to view the latest version.

METR (org)

Edited by Ruby last updated 1st Jul 2024

You are viewing revision 1.0.0, last edited by Ruby

Formerly ARC Evals

Add Posts

Posts tagged METR (org)

2

100METR's Observations of Reward Hacking in Recent Frontier Models

Daniel Kokotajlo

8mo

9

2

97Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

7mo

43

2

21AXRP Episode 47 - David Rein on METR Time Horizons

1mo

0

2

10Review of METR’s public evaluation protocol

2y

0

1

242METR: Measuring AI Ability to Complete Long Tasks

Zach Stein-Perlman

10mo

106

1

153ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks

3y

12

1

145METR's Evaluation of GPT-5

GradientDissenter

6mo

15

1

108Clarifying METR's Auditing Role

2y

1

1

90Introducing METR's Autonomy Evaluation Resources

Megan Kinniment, Beth Barnes

2y

0

1

70Interpreting the METR Time Horizons Post

10mo

13

1

67Reactions to METR task length paper are insane

10mo

43

1

65METR is hiring!

2y

1

1

64CoT May Be Highly Informative Despite “Unfaithfulness” [METR]

GradientDissenter

6mo

3

1

40ARC Evals: Responsible Scaling Policies

Zach Stein-Perlman

2y

10

1

40Is METR Underestimating LLM Time Horizons?

andreasrobinson

1mo

6

Load More (15/22)

Add Posts