AI companies' commitments

Crossposted from AI Lab Watch. Subscribe on Substack.

AI Lab Watch has a page on commitments. It's the best page in the Resources section. I intend to keep it up to date. Like the rest of that section, it's not connected to the scorecard.

This post is mostly to announce that page. If that page is missing some commitments (relevant to AI safety and extreme risks or otherwise notable), please let me know. In the rest of this post, I share more abstract remarks on commitments. You should skim the page rather than read this post.

When a lab identifies a good action, it should generally not just (plan to) take it, but also announce that it is doing so. It should also demonstrate that it's doing it, if relevant. This can draw more attention to good actions, make the lab more likely to do them, let the lab get credit for doing them, and help cause other labs to do them. Labs should also sometimes explain (publicly or internally) their plans for various situations; they should distinguish this from making binding commitments.

Humans disagree and are uncertain about risks from AI and appropriate responses. This need not prevent the labs from making good commitments: they can make commitments conditional on dangers. Labs should often commit to safety measures or responses to various scenarios as a function of warning signs, not just in a vacuum. (Related: Responsible Scaling Policies.)

Sometimes it would be good if all frontier labs did something, but costly and ineffective for some particular lab to do unilaterally. In this case, the labs should make a conditional commitment: commit to do the thing if they get assurance that all other frontier labs will too, and explain how they could get such assurance.

Various good commitments have not been made by any lab, including:

Once your model has demonstrably tried to escape, stop deploying it
Anything concrete on using external auditors (e.g. in pre-deployment risk assessment)
Whistleblower protection stuff
- Never use non-disparagement agreements or otherwise discourage people from publishing concerns (except to prevent release of trade secrets and dangerous information)

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

36

AI companies' commitments

36

36