x

LESSWRONG
LW

METR (org) — LessWrong

You are viewing version 1.0.0 of this page. Click here to view the latest version.

METR (org)

Edited by Ruby last updated 1st Jul 2024

You are viewing revision 1.0.0, last edited by Ruby

Formerly ARC Evals

Add Posts

Posts tagged METR (org)

2

100METR's Observations of Reward Hacking in Recent Frontier Models

Daniel Kokotajlo

10mo

9

2

97Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

9mo

43

2

21AXRP Episode 47 - David Rein on METR Time Horizons

3mo

0

2

10Review of METR’s public evaluation protocol

2y

0

1

242METR: Measuring AI Ability to Complete Long Tasks

Zach Stein-Perlman

1y

106

1

153ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks

3y

12

1

145METR's Evaluation of GPT-5

GradientDissenter

8mo

15

1

108Clarifying METR's Auditing Role

2y

1

1

90Introducing METR's Autonomy Evaluation Resources

Megan Kinniment, Beth Barnes

2y

0

1

70Interpreting the METR Time Horizons Post

1y

13

1

67Reactions to METR task length paper are insane

1y

43

1

65METR is hiring!

2y

1

1

64CoT May Be Highly Informative Despite “Unfaithfulness” [METR]

GradientDissenter

8mo

3

1

40ARC Evals: Responsible Scaling Policies

Zach Stein-Perlman

3y

10

1

40Is METR Underestimating LLM Time Horizons?

andreasrobinson

3mo

6

Load More (15/23)

Add Posts