The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
METR has not intended to claim to have audited anything, or to be providing meaningful oversight or accountability, but there has been some confusion about whether METR is an auditor or planning to be one.
To clarify this point:
“We granted the Alignment Research Center (ARC) early access to the models as a part of our expert red teaming efforts … We provided them with early access to multiple versions of the GPT-4 model, but they did not have the ability to fine-tune it. They also did not have access to the final version of the model that we deployed. The final version has capability improvements relevant to some of the factors that limited the earlier models power-seeking abilities, such as longer context length, and improved problem-solving abilities as in some cases we've observed. … fine-tuning for task-specific behavior could lead to a difference in performance. As a next step, ARC will need to conduct experiments that (a) involve the final version of the deployed model (b) involve ARC doing its own fine-tuning, before a reliable judgment of the risky emergent capabilities of GPT-4-launch can be made”.
“We have previously worked with Anthropic, OpenAI, and other companies to pilot some informal pre-deployment evaluation procedures. These companies have also given us some kinds of non-public access and provided compute credits to support evaluation research.
We think it’s important for there to be third-party evaluators with formal arrangements and access commitments - both for evaluating new frontier models before they are scaled up or deployed, and for conducting research to improve evaluations.
We do not yet have such arrangements, but we are excited about taking more steps in this direction.”
If you’re interested in helping METR conduct third-party evaluations in-house and/or support government or other auditors to perform evaluation protocols we design, then please express interest in working with us, or apply to our current openings.