Reminder: AI Safety is Also a Behavioral Economics Problem

zoop

Last week, OpenAI released the official version of o1, alongside a system card explaining their safety testing framework. Astute observers, most notably Zvi, noted something peculiar: o1's safety testing was performed on a model that... wasn't the release version of o1 (or o1 pro).

Weird! Unexpected! If you care about AI safety, bad! If you fall in this last camp your reaction was probably something like Zvi's:

That’s all really, really, really not okay.

While Zvi's post thoroughly examines the tests, their unclear results, etc., I wanted to zoom in a little more on this tweet from roon (OAI engineer):

"unironically the reason [this happened] is that progress is so fast that we have to write more of these model cards these days. the preparedness evals are more to certify that models aren't dangerous rather than strict capability evals for showing off"

My loose translation is something like: "these tests are annoying to run so for our use-case rough approximation is good enough."

You may not like it, but the following is simply fact: AI safety tests are voluntary, not legally mandated. The core issue wasn't that OpenAI didn't recognize they were cutting corners – the tests were just a pain in the ass.

Put yourself in a researcher's shoes: you're moving fast, competing in a race to the bottom with other companies. You develop an AI model that, while powerful, clearly can't cause Armageddon (yet). Meanwhile, what you view as a slightly histrionic cadre of alarmed critics are demanding you stop work and/or spend significant personal time conducting safety testing that you don't think in this instance rises to the level of "necessary for the future of humanity." Do you:

A) Rush through it to move on with your life

B) Meticulously complete a task you believe is excessive, crossing every t and dotting every i

The quiz is left as an exercise for the reader.

This saga is a friendly reminder: today, AI safety testing is a choice companies make, not mandatory. If researchers or developers dislike running your tests, they'll cut corners. AI safety isn't just a technical challenge – it's a behavioral economics problem.

Until something fundamentally changes (and let's be clear, that's unlikely in the near term), researcher pain (RP) is a key KPI of AI safety test quality. Shaming people is certainly a strategy, but frankly I don't think it works well.

To be solution-oriented, this presents a clear target for safety research: make tests that are so low RP even AI safety skeptics see no cost in running them. Actually being run is an important quality of a safety eval!

LESSWRONG
LW

2

Reminder: AI Safety is Also a Behavioral Economics Problem

2

2