I think the most important concept here is Normalization of Deviance. I think this is the default and expected outcome if an organization tries to import high-reliability practices that are very expensive, that don't fit the problem domain, or that lack the ability to simulate failures. Reporting culture/just culture seems correct and important, and there are a few other interesting lessons, but most stuff is going to fail to transfer.
Most lessons from aviation and medicine won't transfer, because these are well-understood domains where it's feasible to use checklists, and it's reasonable to expect investing in the checklists to help. AI research is mostly not like that. A big risk I see is that people will try to transfer practices that don't fit, and that this will serve to drive symbolic ineffectual actions plus normalization of deviance.
That said, there are a few interesting small lessons that I picked up from watching aviation accident analysis videos, mostly under the label "crew resource management". Commercial airliners are overstaffed to handle the cases where workload spikes suddenly, or someone is screwing up and needs someone else to notice, or someone becomes incapacitated at a key moment. One recurring failure theme in incidents is excessive power gradients within a flight crew: situations where the captain is f*ing up, but the other crewmembers are too timid to point it out. Ie, deference to expertise is usually presented as a failure mode. (This may be an artifact of there being a high minimum bar for the skill level of people in a cockpit, which software companies fail to uphold.)
(Note that I haven't said anything about nuclear, because I think managing nuclear power plants is actually just an easy problem. That is, I think nearly all the safety strategies surrounding nuclear power are symbolic, unnecessary, and motivated by unfounded paranoia.)
(I also haven't said much about medicine because I think hospitals are, in practice, pretty clearly not high-reliability in the ways that matter. Ie, they may have managed to drive the rates of a few specific legible errors down to near zero, but the overall error rate is still high; it would be a mistake to hold up surgical-infection-rate as a sign that hospitals are high-reliability orgs, and not also observe that they're pretty bad at a lot of other things.)
More on the normalization of deviance: Challenger Launch Decision by Diane Vaughan (book outline).
(Note that I haven't said anything about nuclear, because I think managing nuclear power plants is actually just an easy problem. That is, I think nearly all the safety strategies surrounding nuclear power are symbolic, unnecessary, and motivated by unfounded paranoia.)
I think managing a nuclear power plant is relatively easy... but, like, I'd expect managing a regular ol' power plant (or, many other industries) to be pretty easy, and my impression is that random screwups still manage to happen. So it still seems notable to me if nuclear plants avoid the same degree of incidents that other random utility companies have.
I agree many of the safety strategies surrounding nuclear power are symbolic. But I've recently updated away from "yay nuclear power" because while it seems to me the current generation of stuff is just pretty safe and is being sort of unreasonably strangled by regulation... I dunno that I actually expect civilizational competence to continue having it be safe, given how I've seen the rest of civilizational competence playing otu. (I'm not very informed here though)
Prediction: that list of principles was written by a disciple of John Boyd (the OODA loops guy). This same post could just as easily be organizational cliffnotes from Patterns of Conflict.
I also thought these looked similar, so I dedicated a half-hour or so of searching and I could not turn up any relation between either of the authors of the Research Gate summary and Boyd or the military as far as their Wikipedia pages and partial publication lists go. It appears those two have been writing books together on this set of principles since 2001, based on work going back to the 60's and drawing from the systems management literature.
I also checked for some links between Rickover and Boyd, which I thought might be valid because one of Boyd's other areas of achievement was as a ruthless trainer of fighter pilots, which seemed connected through the Navy's nuclear training program. Alas, a couple of shots found them only together in the same document for one generic media article talking about famous ideas from the military.
It sort of looks like Rickover landed on a similar set of principles to Boyd's, but with a goal more like trying to enforce a maximum loop size organization-wide for responding to circumstances.
Self Review.
I wasn't sure at the time the effort I put into this post would be worth it. I spent around 8 hours I think, and I didn't end up with a clear gearsy model of how High Reliability Tends to work.
I did end up following up on this, in "Carefully Bootstrapped Alignment" is organizationally hard. Most of how this post applied there was me including the graph from the vague "hospital Reliability-ification process" paper, in which I argued:
The report is from Genesis Health System, a healthcare service provider in Iowa that services 5 hospitals. No, I don't know what "Serious Safety Event Rate" actually means, the report is vague on that. But, my point here is that when I optimistically interpret this graph as making a serious claim about Genesis improving, the improvements took a comprehensive management/cultural intervention over the course of 8 years.
I know people with AI timelines less than 8 years. Shane Legg from Deepmind said he put 50/50 odds on AGI by 2030.
If you're working at an org that's planning a Carefully Aligned AGI strategy, and your org does not already seem to hit the Highly Reliable bar, I think you need to begin that transition now. If your org is currently small, take proactive steps to preserve a safety-conscious culture as you scale. If your org is large, you may have more people who will actively resist a cultural change, so it may be more work to reach a sufficient standard of safety.
I don't know whether it's reasonable to use the graph in this way (i.e. I assume the graph is exaggerated and confused, but that it still seems suggestive of an lower bound on how long it might take a culture/organizational-practice to shift towards high reliability.
After writing "Carefully Bootstrapped Alignment" is organizationally hard, I spent a couple months exploring and putting some effort into trying to understand why the AI safety focused members of Deepmind, OpenAI and Anthropic weren't putting more emphasis on High Reliability. My own efforts there petered out and I don't know that they were particularly counterfactually helpful.
But, later on Anthropic did announce their Scaling Policy, which included language that seems informed by biosecurity practices (since writing this post, I later went on to interview someone about High Reliability practices in bio, and they described a schema that seems to roughly map onto the Anthropic security levels). I am currently kind on the fence about whether Anthropic's policy has teeth or is more like elaborate Safetywashing, but I think it's at least plausibly a step in the right direction.
Epistemic Effort: Rough notes from a shallow dive. Looked into this for a few hours. I didn't find a strong takeaway but think this is probably a useful jumping-off-point for further work.
Most likely, whether I like it or not, there will someday be AGI research companies working on models that are dangerous to run. Maybe they'll risk an accidental unfriendly hard takeoff. Maybe they'll cross a threshold accelerate us into a hard-to-control multipolar smooth takeoff.[1]
There's some literature on organizations that operate in extremely complex domains, where failure is catastrophic. They're called High Reliability Organizations (HROs). The original work focused on three case studies: A nuclear power plant, an air traffic control company, and a nuclear aircraft carrier (aka nuclear power plant and air traffic control at the same time while people sometimes shoot at you and you can't use radar because you don't want to give away your position and also it's mostly crewed by 20 year olds without much training)
These were notable for a) being extremely complex systems where it would be really easy to screw up catastrophically, and b) somehow, they manage to persistently not screw up.
How do they do that? And does this offer any useful insights to AGI companies?
I started writing this post before Eliezer posted Six Dimensions of Operational Adequacy in AGI Projects. It's not pointed at the exact same problem, but I had a similar generator of "what needs to be true of AI companies, for them to safely work on dangerous tech?". (I think Eliezer had a higher bar in mind with Six Dimensions, which is like, "what is an AI company a researcher could feel actively good about joining")
I was initially pointed in the HRO direction by some conversations with Andrew Critch. Some of his thoughts are written up in the ARCHES report. (There's been some discussion on LessWrong)
My TL;DR after ~10 hours of looking into it:
Principles from "Managing the Unexpected"
The book "Managing the Unexpected", first published in 2001, attempts to answer the question "how can we learn from Highly Reliable Organizations?" in a more general way. I found a summary of the book from researchgate.com which I'm going to quote from liberally. It distills the book down into these points:
I'm not sure how well this all translates to AI research, but the list was interesting enough for me to buy the book. One thing that was quite unclear to me was what the epistemic status of the book was – it prescribes a bunch of interventions on organizational culture, but it doesn't say "we tried these interventions and it worked."
Applications in Hospitals
Fortunately(?), it seems like since the book was first published, there has been a massive effort to port HRO principles over into the hospital industry. (In fact when I googled "High Reliability Organizations", the first several google hits were reports about hospitals saying "Man, we kill people all the time. Why can't we be reliable like those nuclear aircraft carrier people in the HRO literature?"). Some programs started around 2007, and were evaluated in 2018. (
According to this report, a collection of organizational interventions resulted in serious safety events dropping to zero over the course of 9 years. They define "Serious Safety Event" as "a deviation from the standard of care that reaches the patient and leads to more than minor or minimal temporary harm".
Notes on the methodology here:
I do notice "deviation from the standard of care" is a kinda vague concept that depends on whatever the standard of care is. But, assuming they held the standard consistent throughout and didn't solve the problem by massive goodharting... it seems Big If True if they drove things down to literally 0. That's at least encouraging on the general level of "we have working examples of industries becoming safer on purpose."
Does any of this translate into "Research" environments?
Nuclear powerplants, air traffic control, aircraft carriers and hospitals all share a property of having pretty clear feedback loops. And while a lot of the point of being an HRO is responding to surprises, they are in some sense attempting roughly the same thing with the same desired outcome repeatedly. If you mess up, an accident is likely to happen pretty quickly.
Some of the principles ("focus on operations rather than strategy", and "the more volatile your work environment, the more important to respond well in the moment") seem more tailored for tactical jobs rather than research companies. (I can imagine smooth takeoff worlds where an AI company does need to respond quickly in the moment to individual things going wrong... but it feels like something has already gone wrong by that point)
Some of the ideas I'd guess would transfer:
I expect this one to be a bit complex...
...because you might have people who are the most competent at AI design but don't have security/alignment mindset.
An interesting one was...
...which actually feels like maybe applies to the broader x-risk/EA ecosystem.
Appendix: Cliff Notes on "Managing the Unexpected"
[The following is from a researchgate summary, which I found pretty annoyingly formatted. I've pretty translated it into LW-Post format to be easier to read]
For some examples, see this short story short story by gwern for gesture how of an AGI experiment might destroy the world accidentally, or Paul Christiano's What failure looks like, or Andrew Critch's What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)