Introducing METR's Autonomy Evaluation Resources

Megan Kinniment; Beth Barnes

90 Introducing METR's Autonomy Evaluation Resources

by Megan Kinniment, Beth Barnes

15th Mar 2024

1 min read

0

90

This is a linkpost for https://metr.github.io/autonomy-evals-guide/index.html

This is METR’s collection of resources for evaluating potentially dangerous autonomous capabilities of frontier models. The resources include a task suite, some software tooling, and guidelines on how to ensure an accurate measurement of model capability. Building on those, we’ve written an example evaluation protocol. While intended as a “beta” and early working draft, the protocol represents our current best guess as to how AI developers and evaluators should evaluate models for dangerous autonomous capabilities.

We hope to iteratively improve this content, with explicit versioning; this is v0.1.

AI EvaluationsAI Safety Public MaterialsLanguage Models (LLMs)METR (org)AI

Frontpage

90

New Comment

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

90

Introducing METR's Autonomy Evaluation Resources

90

90

90