I've been working fully remotely and have meaningfully contributed to global organizations without physical presence for over a decade. I see parallels with anti-remote and anti-safety arguments.
I've observed the robust debate regarding 'return to work' vs 'remote work,' with many traditional outlets proposing 'return to work' based on a series of common criteria. I've seen 'return to work' arguments assert remote employees are lazy, unreliable or unproductive when outside the controlled work environment. I would generalize the rationale as an assertion that 'work quality cannot be assured if it cannot be directly measured.' Given modern technology allows us to measure employee work product remotely, and given the distributed work of employees across different offices for many companies, this argument seems fundamentally flawed and perhaps even intentionally misleading. My belief in the arguments being misleading is compounded by my observations that these articles never mention related considerations like cost of rental/ownership of property and the handling of those costs, nor elements like cultural emphasis on predictable work targets or management control issues.
In my view, the reluctance to embrace remote work often distills to a failure to see beyond immediate, egocentric concerns. Along the same lines, I see failure to plan for or prioritize AI safety as stemming from a similar inability to perceive direct, observable consequences to the party promoting anti-safety mindsets.
Anecdotally, I came across an article that proposed a number of cultural goals for successful remote work. I shared the article with my company via our Slack. I emphasized that it wasn't the goals themselves that were important, but rather adopting a culture that made those goals critical. I suggested that Goodhart's Law applied here- once a measure becomes a target, it ceases to be a good measure. A culture that values and principals beyond the listed goals would succeed, not just a culture that blindly pursues the listed goals.
I believe the same can be said for AI Safety. Focusing on specific risks, or specific practices won't create a culture of safety. Instead, as the post (above) suggests, a culture that does not value the principals behind a safety-first mentality will attempt to merely meet the goals, or work around the goals, or undermine the goals. Much as some advocates for "return to work" are egocentrically misrepresenting remote work, some anti-safety advocates are egocentrically misrepresenting safety. For this reason, I've been researching the history of adoption of a safety mentality, to see how I can promote a safety-first culture. Otherwise I think we (both my company, and the industry as a whole) risk prioritizing egocentric, short-term goals over societal benefit and long-term goals.
I've been looking at the human history about adoption of safety culture, and invariably, it seems to me that safety mindsets are adopted only after loss, usually loss of human life. It is described anecdotally in the paper associated with this post.
The specifics of how safety culture is implemented differ, but the broad outlines are similar. Most critical for the development of the idea of safety culture were efforts launched in the wake of the 1979 Three Mile Island nuclear plant accident and near-meltdown. In that case, a number of reports noted the various failures, and noted that in addition to the technical and operational failures, there was a culture that allowed the accidents to occur. The tremendous public pressure led to significant reforms, and serves as a prototype for how safety culture can be developed in an industry.
Emphasis added by me.
NOTE: I could not find any indication of loss of human life attributed to Three Mile Island, but both Chernobyl and Fukushima happened after Three Mile Island, and both did result in loss of human life. It's also important to note that both Chernobyl and Fukushima were both classed INES Level 7, compared to Three Mile Island which was classed INES Level 5. This evidence is contradictory to what was in the quoted part of the paper. (And, sadly, I think supports an argument that Goodhart's Curse is in play... that safety regressed to the mean... that by establishing minimum safety criteria instead of a safety culture, certain disasters not only could not be avoided but were more pronounced than previous disasters.) So both of the worst reactor disasters in human history occurred after the safety cultures that were promoted following Three Mile Island.[1][2] The list of nuclear accidents is longer than this, but not all accidents result in loss.[3][2:1] (This is something that I've been looking at for a while, to inform my predictions about the probability of humans adopting AI safety practices with regards to pre- or post- AI disasters.)
In my personal capacity (read: area of employment) I'm advocating for adversarial testing of AI chatbots. I am highlighting the "accidents" that have already occurred: Microsoft Tay Tweets[4], SnapChat AI Chatbot[5], Tessa Wellness Chatbot[6], Chai Eliza Chatbot[7].
I am promoting the mindset that if we want to be successful with artificial intelligence, and do not want to become a news article, that we should test expressly for ways that the chatbot can be diverted from the chatbots primary function, and design (or train) fixes for those problems. It requires creativity, persistence and patience... but the alternative is that one day, we might be in the news if we fail to proactively address the challenges that obviously face anyone who is trying to use artificial intelligence.
And, like my advocacy about looking at what values a culture should have that wants to adopt a pro-remote culture and be successful at it, we should look at what values a culture should have that wants to adopt a pro-safety-first culture and be successful at it.
I'll be cross posting the original paper to my work. Thank you for sharing.
DISCLAIMER: AI was used to quality check my post, assessing for consistency, logic and soundness in reasoning and presentation styles. No part of the writing was authored by AI.
https://www.processindustryforum.com/energy/five-worst-nuclear-disasters-history ↩︎
https://en.wikipedia.org/wiki/Nuclear_and_radiation_accidents_and_incidents ↩︎ ↩︎
https://ieer.org/resource/factsheets/table-nuclear-reactor-accidents/ ↩︎
https://www.washingtonpost.com/technology/2023/03/14/snapchat-myai/ ↩︎
https://www.nytimes.com/2023/06/08/us/ai-chatbot-tessa-eating-disorders-association.html ↩︎
https://www.complex.com/life/father-dies-by-suicide-conversing-with-ai-chatbot-wife-blames ↩︎
Thanks, this is great commentary.
On your point about safety culture after 3MI, when it took hold, and regression to the mean, see this article: https://www.thenation.com/article/archive/after-three-mile-island-rise-and-fall-nuclear-safety-culture/ Also, for more background about post-3MI safety, see this report: https://inis.iaea.org/collection/NCLCollectionStore/_Public/34/007/34007188.pdf?r=1&r=1
This is a linkpost (to the EA forum version of this post, which is) for a new preprint, entitled "Building a Culture of Safety for AI: Perspectives and Challenges," and a brief explanation of the central points. Comments on the ideas in the post are welcome, but much of the content which clarifies the below is in the full manuscript.
Safety culture in AI is going to be critical for many of the other promising initiatives for AI safety.
However, there are lots of challenges to making such a culture.
Thankfully, there are some promising approaches, especially on the last point. These include identifying future risks proactively via various risk analysis methods, red-teaming, and audits. But as noted above, audits are most useful once safety culture is prioritized - though there is some promise in the near-term for audits to make lack of safety common knowledge.
Next steps include building the repertoire of tools that will reduce risks and can be used to routinize and inculcate safety culture in the industry, and getting real buy-in from industry leaders for prioritizing safety.
Thanks to Jonas Schuett, Shaun Ee, Simeon Campos, Tom David, Joseph Rogero, Sebastian Lodemann, and Yonaton Cale for helpful suggestions on the manuscript.