This is a linkpost for https://arxiv.org/abs/1811.10840v1
How does "Safety-II" compare with Eliezer's description of security mindset? On the surface they sound very similar, and I would expect highly reliable organizations to value a security mindset in some form.
I don't recall how Eliezer thinks of security mindset, but in its original context it's more about thinking like an adversary and designing things knowing that they will be subject to attack from multiple angles and that you might fail to anticipate all angles of attack so you better be ready for the unexpected.
This academic note was previously linked but undiscussed in AN#35, and I finally got around to reading it and really liked it so wanted to highlight it with a post.
The abstract:
The gist is that AI, including narrow AI, can be a dangerously powerful technology similar to nuclear power and weapons, and the organizations that best control these systems have certain features that categorize them as High Reliability Organizations (HROs). The features of an HRO are (quoted from Wikipedia, quoted from this book by the originators of the HRO concept):
The note argues for finding ways to incorporate AI in an HRO not only as a technology to be controlled by one (that's just the baseline), but also as a functional member of the HRO, with the same responsibilities to the organization as any human member would have, such as the right to halt operations if it believes danger is imminent and a responsibility to report anomalies it discovers.
An interesting extension of this line of thinking would be to combine the HRO Safety-I approach with newer Safety-II approaches (an overly short summary is that Safety-I, the traditional approach to safety, is about taking steps to avoid errors, and Safety-II is about creating robust conditions for success where errors are less likely to happen).
I've seen good outcomes from applying these kinds of changes to organizational design to accomplish safety within my primary field of work (system operations, site reliability engineering, devops, or whatever you want to call it), and I think extending this kind of thinking to how organizations work with AI is also likely valuable in that it will reduce, even if not theoretically eliminate, the risk of AI accidents that would otherwise be preventable in retrospect.