Excited to see this! I'd be most excited about case studies of standards in fields where people didn't already have clear ideas about how to verify safety.
In some areas, it's pretty clear what you're supposed to do to verify safety. Everyone (more-or-less) agrees on what counts as safe.
One of the biggest challenges with AI safety standards will be the fact that no one really knows how to verify that a (sufficiently-powerful) system is safe. And a lot of experts disagree on the type of evidence that would be sufficient.
Are there examples of standards in other industries where people were quite confused about what "safety" would require? Are there examples of standards that are specific enough to be useful but flexible enough to deal with unexpected failure modes or threats? Are there examples where the standards-setters acknowledged that they wouldn't be able to make a simple checklist, so they requested that companies provide proactive evidence of safety?
One of the biggest challenges with AI safety standards will be the fact that no one really knows how to verify that a (sufficiently-powerful) system is safe. And a lot of experts disagree on the type of evidence that would be sufficient.
While overcoming expert disagreement is a challenge, it is not one that is as big as you think. TL;DR: Deciding not to agree is always an option.
To expand on this: the fallback option in a safety standards creation process, for standards that aim to define a certain level of safe-enough, is as follows. If the experts involved cannot agree on any evidence based method for verifying that a system X is safe enough according to the level of safety required by the standard, then the standard being created will simply, and usually implicitly, declare that there is no route by which system X can comply with the safety standard. If you are required by law, say by EU law, to comply with the safety standard before shipping a system into the EU market, then your only legal option will be to never ship that system X into the EU market.
For AI systems you interact with over the Internet, this 'never ship' translates to 'never allow it to interact over the Internet with EU residents'.
I am currently in the JTC21 committee which is running the above standards creation process to write the AI safety standards in support of the EU AI Act, the Act that will regulate certain parts of the AI industry, in case they want to ship legally into the EU market. ((Legal detail: if you cannot comply with the standards, the Act will give you several other options that may still allow you to ship legally, but I won't get into explaining all those here. These other options will not give you a loophole to evade all expert scrutiny.))
Back to the mechanics of a standards committee: if a certain AI technology, when applied in a system X, is well know to make that system radioactively unpredictable, it will not usually take long for the technical experts in a standards committee to come to an agreement that there is no way that they can define any method in the standard for verifying that X will be safe according to the standard. The radioactively unsafe cases are the easiest cases to handle.
That being said, in all but the most trivial of safety engineering fields, there is a complicated epistemics involved in deciding when something is safe enough to ship, it is complicated whether you use standards or not. I have written about this topic, in the context of AGI, in section 14 of this paper.
I agree that, at least for the more serious risks, there doesn't seem to be consensus on what the mitigations should be.
For example, I'd be interested to know what proportion of alignment researchers would consider an AGI that's a value learner (and of course has some initial model of human values created by humans to start that value learning process from) to have better outer-alignment safety properties that an AGI with a fixed utility function created by humans.
For me it very clear that the former is better, as it incentivizes the AGI to converge from its initial model of human values towards true human values, allowing it to fix problems when the initial model, say, goes out-of-distribution or doesn't have sufficient detail. But I have no idea how much consensus there is on this, and I see a lot of alignment researchers working on approaches that don't appear to assume that the AI system is a value learner.
My suspicion is the most instructive cases to look at (Modern AI really is too new a field to have much to go on in terms of mature safety standards) is how the regulation of Nuclear and Radiation safety has evolved over time. Early research suggested some serious X-Risks that didn't pan out for either scientific (igniting the atmosphere) or logistical/political reasons (cobalt bombs, tsar bomba scale H bombs) thankfully, but some risks arising more out of the political domain (having big gnarly nuclear war anyway) still exist that could certainly make it a less fun planet to live on. I suspect the successes and failures of the nuclear treaty system could be instructive here with the push to integrate big AI into military heirachies, as regulating nukes is something almost everyone agrees is a very good idea, but have had a less than stellar history of compliance.
They are likely out of scope for whataever your goal is here, but I do think they need serious study because without it, our attempts at regulation will just push unsafe AI to less savory juristictions.
This seems great!
One additional example I know of, which I do not have personal experience with but know that a lot of people do have experience with, is compliance with PCI DSS (for credit card processing). Which does deal with safety in an adversarial setting where the threat model isn't super clear.
(my interactions with it look like "yeah that looks like a lot and we can outsource the risky bits to another company to deal with? great!")
A high-level theme that would be interesting to explore here is rules-based vs. principles-based regulation. For example, the UK financial regulators are more principles-based (broad principles of good conduct, flexible and open to interpretation). In contrast, the US is more rules-based (detailed and specific instructions).
https://www.cfauk.org/pi-listing/rules-versus-principles-based-regulation
[Edit - on further investigation this seems to be a more UK-specific point; US regulations are much less ambiguous as they take a rules-based approach unlike the UK's principles-based approach]
It's interesting to note that financial regulations sometimes possess a degree of ambiguity and are subject to varying interpretations. It's frequently the case that whichever institution interprets them most stringently or conservatively effectively establishes the benchmark for how the regulation is understood. Regulators often use these stringent interpretations as a basis for future clarifications or refinements. This phenomenon is especially observable in newly introduced regulations pertaining to emerging forms of fraud or novel technologies.
Update: the case studies collected0 via this project that authors have agreed to make publicly available are here (including some that already have public links). We are not currently taking applications for new projects. I will likely post some reflections from reading the case studies at a later date.
We are not taking more applications for now.
I’m looking for concise, informative case studies on social-welfare-based standards1 for companies and products (including standards imposed by regulation).
I think case studies could help a lot with making AI safety standards work.
This post outlines:
Some quick background on AI safety standards
The basic idea of AI safety standards would be:
If something like this happened, it could (a) make dangerous AI deployments more costly and less likely; (b) reduce “race dynamics” in which companies have to choose between releasing dangerous models and fearing that their competitors will do so; (c) increase incentives for alignment research and other danger-reducing measures (since these things, if done well, might allow companies to release powerful systems while staying in compliance with standards).
One of the things that appeals to me about this general model is that there is plenty of precedent for similar models in other industries. It’s common for companies to voluntarily follow social-welfare-oriented standards established by third parties, aiming - through compliance with standards - to increase confidence in the social responsibility of their work. Sometimes these standards are quite detailed and take a lot of work and/or expense both to create and follow. And there’s also precedent for initially voluntary standards to end up codified in regulation.
Some relevant examples include farm animal welfare standards (governing how animals are treated on farms), environmental standards (governing companies’ environmental impacts), security standards (governing e.g. how customer data is protected), safety standards (for airplanes, wetlabs and more), and financial standards aimed at e.g. preventing a bank collapse. More below.
Some ways case studies can be useful
There’s a lot of interest in AI safety standards right now, and I’m encountering a lot of differences of opinion on questions like:
I think that studying cases of existing widely-adopted standards can shed a lot of light on how these questions have been answered in other cases (both successful and unsuccessful).
They can thus inform the strategies taken by people looking to write or help shape safety standards that are both highly protective and widely adopted.
So far I’ve done one mini-case-study: a case study on farm animal welfare standards based on a conversation with Lewis Bollard. I’ve picked up a number of things from this that may be useful to people working on standards, such as:
I think case studies can also help us a lot with the general problem that we don’t know what we don’t know.
My impression is that standards often take a very long time to take shape and gain wide adoption. If we want to “speedrun” this process due to the possibility of transformative AI being developed soon, learning as quickly and thoroughly as possible how things have worked elsewhere seems important.
Narrowing down standards to learn about
There are an enormous number of standards out there (ISO alone maintains almost 25,000). I’m especially interested in cases that share some key properties with potential AI safety standards. In particular:
I’m interested in intense standards for high-stakes applications. Some standards are relatively lightweight (e.g., international food standards); higher-stakes standards tend to be more intense, and I think the latter will be most appropriate for potentially transformative AI systems.
Example high-stakes standards: biosafety standards (see BMBL as well as the Federal Select Agency Standards); nuclear safety standards (e.g., IAEA’s); safety standards for chemical producers; airline safety standards; and standards and regulations that the FDA imposes on drugs.
I’m interested in standards that involve complex, sometimes creative risk assessment and/or intense, even adversarial auditing. Some standards seem straightforward to observe and verify (farm animal welfare standards are an example); I don’t think we can count on this being the case for AI, where it can take a lot of knowledge and creativity to answer questions like “What dangerous activities is this AI system really capable of?”
I think financial regulation and financial standards (e.g., FINRA) are a promising place to look for this sort of thing, since financial risks are often hard to understand and assess. (
I’m told that in some cases, regulators are embedded within a financial company, going to work every day in the company’s office;[I was prodded about this and couldn't confirm it, and think it's probably not right] also see this interesting Twitter thread arguing that the bank supervision model is promising for AI.)Some other promising categories:
I’m interested in standards that are more complex than just “checklists.” Most standards are something like: “You meet the standard if and only if the following things are all true of your company/product.” But I think AI safety standards might have to involve more complex conditions, like: “If an AI strongly demonstrates dangerous property X, then mitigation measures ___ are required; if it only weakly demonstrates dangerous property X, then lesser mitigation measures ____ are required.”
Here again financial regulations might be useful, for example the Large Financial Institution Rating System.
Institutional Review Boards might be useful as well, and have some other parallels as well (e.g., they are required before performing research).
I’m interested in standards that are motivated by non-monetized social welfare.
All else equal, standards for things that are more similar to AI are better. (E.g., software is probably better to examine than food, although other factors here could outweigh this.)
I’m interested in failure stories, not just success stories. A good example might be bond credit ratings: third-party certifiers of creditworthiness came to play an important role in the economy, but they failed to correctly assess creditworthiness (when accounting for e.g. systemic risk), leading some institutions that were supposed to be conservative to take on too much risk (more).
I’m especially interested in private/voluntary standards, and even more especially in cases where private/voluntary standards helped shape later regulation, though I’m not exclusively interested in these (some of the examples above are regulation-backed standards).
What I’m looking for in case studies
I’m looking for case studies that:
Other projects I might be interested in
I’m also interested in writeups that look for patterns across a large number of standards. Example topics include:
In general, feel free to use the form below to pitch me on any analysis you think could be useful, although I expect to be most likely to support analysis that is heavily about learning from past/existing cases (rather than about making abstract arguments).
Who can do case studies, and how can they find the relevant information?
I don’t think you need to be a subject-matter expert to do a good case study. You just need to be able to find the relevant information about how a standard works, how the process for maintaining it works, etc. This could be by:
How to participate
Please use this form to (a) let me know about your interest in doing a case study or other writeup; (b) apply for funding to support the work. My basic default is to offer funding for up to 50 hours per case study, with room for negotiation in special circumstances. The rate of pay will be at least $75/hour for all approved cases, and could be higher (the form submission asks for information on this).
In any cases where I favor providing funding, I’ll make the recommendation to Open Philanthropy to do so.
If I get multiple proposals to study the same thing, I will probably do something to avoid redundancy (e.g., email the parties in question so they’re aware of overlapping efforts). This is a reason to use the form even if you’re not seeking funding.
I may also occasionally update this post to note whether some topics seem likely to already be well-covered.
Got ideas for more case studies?
Please share them in the comments! I’ve found that a lot of people happen to know of standards that are interestingly analogous to AI safety standards. Some guidance on how to look for such analogies is above.
For this post I talked to a number of people to get ideas on what good case studies might be, and on how some particular standards work. I’m grateful to Daniela Amodei, Sam Bell, Alexander Berger, Lewis Bollard, Alexis Carlier, Rocco Casagrande, Ben Garfinkel, Jonathan Gleklen, Mindy James, Richard Korzekwa, Jade Leung and Piers Millett for help and/or suggesting good example standards to learn about. These folks shouldn’t be seen as responsible for the content of the post.
Notes
Most of these case studies were directly paid for via this project, but in some cases the work was pro bono, or someone adapted or sent a copy of work that had been done for another project, etc. ↩
For a nice definition of standards, see ISO’s definition. ↩
One language model claimed that standards such as BSL-4 originated with the Asilomar conference on recombinant DNA, but I haven’t been able to find any source supporting this, and one biorisk expert I talked to was pretty sure it was false. ↩