Concrete suggestion: OpenAI should allow the Safety Advisory Group Chair and the head of the Preparedness Team to have “veto power” on model development and deployment decisions.
Quite possibly a good idea, but I think it's less obvious than it seems at first glance:
Remember that a position's having veto power will tend to have a large impact on selection for that position.
The comparison isn't [x with veto power] vs [x without veto power].
It's [x with veto power] vs [y without veto power].
If y would tend to have deeper understanding, more independence or more caution than x, it's not obvious that giving the position veto power helps. Better to have someone who'll spot problems and need to use persuasion, than someone who can veto but spots no problems.
In this post, I summarize some of my thoughts on OpenAI's recent preparedness framework. The post focuses on my opinions/analysis of the framework, and it doesn't have much summary (for readers unfamiliar with the framework, I recommend Zvi’s post).
As a high-level point, I believe voluntary commitments from individual labs will be insufficient, and I'm most excited about policies that governments could apply to all AGI corporations. Nonetheless, I think efforts from individual labs can be useful, and it's important for government-focused AI policy experts to engage with the ideas coming out of top AGI companies[1].
With that in mind, throughout the post, I try not to harp on pessimistic points like "more is needed", "this document hasn't single-handedly saved the day or addressed race dynamics", or "how does this prevent competitive pressures+race dynamics from forcing everyone to proceed incautiously". (I think these critiques are useful, but that isn't the purpose of this post). Instead, I focus more on "what do I think OpenAI's preparedness team did well?" and "What do I think are some things OpenAI's preparedness team could realistically change that would make this even better?"
Communication[2]
I previously noted that I was critical of the way that Anthropic communicated about its Responsible Scaling Policy. It seemed to me like Anthropic’s RSP was written in a corporate style that made it somewhat hard to identify the core content and signaled to policymakers that the situation is “under control” and that scaling could occur responsibly as long as companies released responsible scaling policies.
To my surprise, I think OpenAI’s preparedness framework communicated its ideas in a much clearer and more concern-inducing way. Here are a few specific things I appreciated about the style/tone/communication in OpenAI’s preparedness framework:
Content
What about the substance of the preparedness framework? I applaud OpenAI for being relatively concrete about the kinds of conditions that would cause them to pause development and deployment.
I especially liked the following:
Critiques & Recommended Changes
Thus far, the post has mostly focused on things I appreciate about the PF. In this section, I offer some critiques and recommended changes.
canmight be able to remove the CEO). As Zvi notes, it would be preferable (from a safety perspective) to have multiple parties that are able to “veto” the development or deployment of a potentially-catastrophe-capable model. If the OpenAI CEO believes a model is safe, but the head of the Safety Advisory Group or the head of the Preparedness Team thinks the model has an unacceptably high chance of causing a catastrophe, this seems like a good reason to not develop or deploy the model.Finally, although this wasn't the purpose of the PF, I'd be excited to see OpenAI's preparedness team (and similar teams at other scaling labs) produce recommendations for governments. Insofar as internal corporate policies can pave the way for strong policies (e.g., a licensing regime or self-certification regime that requires certain kinds of eval results to be shared with the government), it would be excellent to see companies publicly and loudly advocating for such policies.
See also: On OpenAI’s preparedness framework (Zvi), comment about Anthropic’s RSP (me), My thoughts on OpenAI’s alignment plan (me, somewhat outdated), How evals might (or might not) prevent catastrophic risks from AI (me, somewhat outdated), Six Dimensions of Operational Adequacy in AGI projects (Yudkowsky).
Aside: I personally believe that the safest pathway toward AGI would involve a moratorium on the development of superintelligence, ending the race toward superintelligence, and establishing a single international AGI safety project. I’m also generally more excited about government-based policy proposals (e.g., licensing) that would ensure that safety standards are applied to all AGI companies. However, I still think internal efforts by single AGI companies, while insufficient, can still be useful, and they can provide helpful inspiration for government policy. I also think that it’s generally epistemically useful for folks who focus on lab governance to occasionally red-team the work of folks focused on government policy, and same vice-versa.
Why does good communication matter in the first place? It seems plausible to me that some readers might think I’m spending too much time analyzing the “style” of the document as opposed to its content. My short response is that I think communication/style matters a lot for public-facing documents, especially ones that are meant to inform policy. I model policymakers as being extremely busy people who often have 10+ priorities they are tasked with. To the extent that one can make a document clearer or improve the “vibes” that get communicated in a quick skim, I think this actually has a meaningful effect on the AI policy discourse. Of course, it is also extremely important to have detailed & nuanced content for experts to discuss/debate, but I think it’s easy for experts in an area to underestimate the importance of tone/style/clarity/vibes. See also Pinker on the Curse of Knowledge.