So, this isn't exactly a critique of your work, but more of a meta-note about the choice of focus and the assumptions that that implies. You have chosen to focus on leading AI companies, and their behaviors. This implies the assumption that they are the only relevant force, and will remain so. I just want to point out that this isn't my view.
I think the open source community is also a substantial source of risk, and will grow in risk faster than any responsible company is likely to. Indeed, a major risk from a company (e.g. Meta) is in how much it contributes to progress in the open source community.
Why? Because misuse risk is a huge deal, like using AI to help create bioweapons. This is already a civilizational scale risk and is getting rapidly worse. Eventually, we will also get AGI misalignment risks in broader and broader sets of actors. If all the major companies behave in personally responsible ways, but don't contribute to reducing the rate that the open source community is catching up, and if the government doesn't figure out a way to regulate this... then we should expect to see all the same harms that you might predict an incautious AI lab could lead to. Just a few years later.
For more of my thoughts on this: https://www.lesswrong.com/posts/oJQnRDbgSS8i6DwNu/the-hopium-wars-the-agi-entente-delusion?commentId=8GSmaSiePJusFptLB
Reading guidelines: If you are short on time, just read the section “The importance of quantitative risk tolerance & how to turn it into actionable signals”
Tl;dr: We have recently published an AI risk management framework. This framework draws from both existing risk management approaches and AI risk management practices. We then adapted it into a rating system with quantitative and well-defined criteria to assess AI developers' implementation of adequate AI risk management. What comes out of it is that all companies are still far from "strong AI risk management" (grade >4 out of 5). You can have a look at our website to see our results presented in an accessible manner: https://ratings.safer-ai.org/
Motivation
The literature on risk management is very mature and has been refined by a range of industries for decades. However, OpenAI's Preparedness Framework, Google Deepmind's Frontier Safety Framework, and Anthropic's Responsible Scaling Policy do not reference explicitly the risk management literature.
By analyzing those more in-depth, we have identified several deficiencies: the absence of a defined risk tolerance, the lack of semi-quantitative or quantitative risk assessment, and the omission of a systematic risk identification process. We propose a risk management framework to fix these deficiencies.
AI risk management framework
Risk management dimensions
Our framework is centered around 3 main dimensions.
The following figure illustrates these dimensions with examples for each:
The importance of quantitative risk tolerance & how to turn it into actionable signals
The most important part is risk tolerance and analysis.
The first step here is to define a quantitative risk tolerance (aka risk threshold) which states quantitatively (expressed as a product of probability and severity of risks) the level of risks that a company finds acceptable. Currently, no AI company is doing this. For example, in “Responsible Scaling Policy Evaluations Report – Claude 3 Opus”, Anthropic states: "Anthropic's Responsible Scaling Policy (RSP) aims to ensure we never train, store, or deploy models with catastrophically dangerous capabilities, except under a safety and security standard that brings risks to society below acceptable levels."
Whether they realize it or not, when companies are defining a capability threshold they are picking an implicit quantitative risk tolerance. It becomes clear in conversations on whether e.g. ASL-3 mitigations are sufficient for ASL-3 level of capabilities (ASL-3 is an “AI safety level” defined in Anthropic Responsible Scaling Policies, it corresponds to a certain level of dangerous AI capabilities). Such a conversation would typically go along the following lines:
You see that in this conversation, which we ran into quite a lot, there’s necessarily a moment where you have to discuss:
Setting capabilities thresholds without additional grounding just hides this difficulty & confusion. It doesn’t resolve it. Many disagreements about acceptability of capability thresholds & mitigations are disagreements about acceptable levels of risks. We can’t resolve those with evaluations only. We’re just burying them.
We suspect that some AI developers may already be implementing internally a process along those lines, but that they’re not publicizing it because numbers seem unserious. The status quo is therefore likely to be such that AI developers are:
The absence of justification of the resulting capabilities thresholds makes it impossible to pushback, as there is nothing to pushback against. The capabilities thresholds feel arbitrary.
We think that AI risk management should go towards much more quantitative practices. The path we propose is the following:
There are multiple benefits of going towards a more quantitative approach to AI risk management:
Nuclear risk management followed this path with probabilistic risk assessment (this post provides insights into this transition). Even if the estimates were poor at first, the act of producing these estimates produced a feedback loop, compelled experts to make their methodologies public, invite external scrutiny, and iteratively refine their approach, in collaboration with regulators. This process ultimately led to a significantly enhanced understanding of potential failure modes in nuclear reactors and significantly enhanced safety measures.
Doing the mapping between risks and AI capabilities is hard. At the moment, there’s little public literature on the risk modeling, expert elicitation and data baselines needed. To start to fill this gap, we have kicked off a new research project to develop a quantitative risk assessment methodology. We will commence with the risk of AI-enabled cyberattacks as a proof of concept. You can see our preliminary research plan here. We’d love to receive feedback. Feel free to reach out to malcolm@safer-ai.org if you would like to comment or collaborate on that.
Rating the risk management maturity of AI companies
To assess how good companies are at risk management, we converted our risk management framework into ratings by creating scales from 0 to 5.
0 corresponds to non-existent risk management practices and 5 corresponds to strong risk management. The result of this assessment is the following:
We observe that all the companies are far from having a perfect score, and as mentioned earlier, the criterion where companies are the most falling short is general risk tolerance.
We plan to update our methodology in future iterations to make it more comprehensive. For instance, our current framework does not account for internal governance structures. Incorporating these elements is tricky, as they are largely upstream factors that influence performance across all the variables currently included in our rating. Adding them directly might introduce double-counting issues. However, we recognize their importance and want to tackle them in our next iteration.
We will also update the ratings based on updates from companies (note that we haven't incorporated Anthropic's RSP update).
We hope that our ratings will incentivize AI companies to improve their risk management practices. We will work to ensure that they are usable by policymakers, investors, and model deployers who care about AI risk management.
You can find our complete methodology here. You can find our complete assessment of companies’ risk management practices on this website.