Summary: My intuition is that "High Reliability Organizations" may not be the best parallel here: A better one is probably "organizations developing new high-tech systems where the cost of failure is extremely high". Examples are organizations involved in chip design and AV (Autonomous Vehicle) design.
I'll explain below why I think they are a better parallel, and what we can learn from them. But first:
Some background notes:
I have spent many years working in those industries, and in fact participated in inventing some of the related verification / validation / safety techniques ("V&V techniques" for short).
Chip design and AV design are different. Also, AV design (and the related V&V techniques) are still work-in-progress – I'll present a slightly-idealized version of it.
I am not sure that "careful bootstrapped alignment", as described, will work, for the various reasons Eliezer and others are worried about: We may not have enough time, and enough world-wide coordination. However, for the purpose of this thread, I'll ignore that, and do my best to (hopefully) help improve it.
Why this is a better parallel: Organizations which develop new chips / AVs / etc. have a process (and related culture) of "creating something new, in stages, while being very careful to avoid bugs". The cost-of-failure is huge: A chip design project / company could die if too many bugs are "left in" (though safety is usually not a major concern). Similarly, an AV project could die if too many bugs (mostly safety-related) cause too many visible failures (e.g. accidents).
And when such a project fails, a few billion dollars could go up in smoke. So a very high-level team (including the CEO) needs to review the V&V evidence and decide whether to deploy / wait / deploy-reduced-version.
How they do it: Because the stakes are so high, these organizations are often split into a design team, and an (often bigger) V&V team. The V&V team is typically more inventive and enterprising (and less prone to Goodharting and "V&V theatre") than the corresponding teams in "High Reliability Organizations" (HROs).
Note that I am not implying that people in HROs are very prone to those things – it is all a matter of degree: The V&V teams I describe are simply incentivized to find as many "important" bugs as possible per day (given finite compute resources). And they work on a short (several years), very intense schedule.
They employ techniques like a (constantly-updated) verification plan and safety case. They also work in stages: Your initial AV may be deployed only in specific areas / weathers / time-of-day and so on. As you gain experience, you "enlarge" the verification plan / safety case, and start testing accordingly (mostly virtually). Only when you feel comfortable with that do you actually "open up" the area / weather / number-of-vehicles / etc. envelope.
Summary: My intuition is that "High Reliability Organizations" may not be the best parallel here: A better one is probably "organizations developing new high-tech systems where the cost of failure is extremely high". Examples are organizations involved in chip design and AV (Autonomous Vehicle) design.
I'll explain below why I think they are a better parallel, and what we can learn from them. But first:
Some background notes:
Why this is a better parallel: Organizations which develop new chips / AVs / etc. have a process (and related culture) of "creating something new, in stages, while being very careful to avoid bugs". The cost-of-failure is huge: A chip design project / company could die if too many bugs are "left in" (though safety is usually not a major concern). Similarly, an AV project could die if too many bugs (mostly safety-related) cause too many visible failures (e.g. accidents).
And when such a project fails, a few billion dollars could go up in smoke. So a very high-level team (including the CEO) needs to review the V&V evidence and decide whether to deploy / wait / deploy-reduced-version.
How they do it: Because the stakes are so high, these organizations are often split into a design team, and an (often bigger) V&V team. The V&V team is typically more inventive and enterprising (and less prone to Goodharting and "V&V theatre") than the corresponding teams in "High Reliability Organizations" (HROs).
Note that I am not implying that people in HROs are very prone to those things – it is all a matter of degree: The V&V teams I describe are simply incentivized to find as many "important" bugs as possible per day (given finite compute resources). And they work on a short (several years), very intense schedule.
They employ techniques like a (constantly-updated) verification plan and safety case. They also work in stages: Your initial AV may be deployed only in specific areas / weathers / time-of-day and so on. As you gain experience, you "enlarge" the verification plan / safety case, and start testing accordingly (mostly virtually). Only when you feel comfortable with that do you actually "open up" the area / weather / number-of-vehicles / etc. envelope.
Will be happy to talk more about this.