I wonder if "transparency" as an instrumental goal is a lost purpose. Presumably your original goal is "safety". One way to ensure safety is validation, which is made easier by transparency because "then additional methods of reliability verification are available". Only it does not help for sufficiently complex systems (your caveat 1), no matter how transparently designed they are. So maybe it's worth looking for other instrumental sub-goals toward the terminal goal of safety. A couple off the top of my head:
multi-layer design with built-in verification in each layer (cf. unit-testing/integration testing/black-box testing etc. in programming practice) to avoid having to validate all possible combinations of elementary inputs.
design for easy validation (for example, sorting a list is harder than verifying that it is sorted).
multi-layer design with built-in verification in each layer (cf. unit-testing/integration testing/black-box testing etc. in programming practice) to avoid having to validate all possible combinations of elementary inputs.
Isn't that what transparent formal verification is? Being aware of the structure of the system, prove that each component does what it's supposed to, then prove that the combination does what it's supposed to given that the components do. You have to have transparent access to the structure of the system and the behavior of the sub components to do this.
I've just posted an analysis to MIRI's blog called Transparency in Safety-Critical Systems. Its aim is to explain a common view about transparency and system reliability, and then open a dialogue about which parts of that view are wrong, or don't apply well to AGI.
The "common view" (not universal by any means) explained in the post is, roughly:
Three caveats / open problems listed at the end of the post are:
The MIRI blog has only recently begun to regularly host substantive, non-news content, so it doesn't get much commenting action yet. Thus, I figured I'd post here and try to start a dialogue. Comment away!