I've just posted an analysis to MIRI's blog called Transparency in Safety-Critical Systems. Its aim is to explain a common view about transparency and system reliability, and then open a dialogue about which parts of that view are wrong, or don't apply well to AGI.

The "common view" (not universal by any means) explained in the post is, roughly:

Black box testing can provide some confidence that a system will behave as intended, but if a system is built such that it is transparent to human inspection, then additional methods of reliability verification are available. Unfortunately, many of AI’s most useful methods are among its least transparent. Logic-based systems are typically more transparent than statistical methods, but statistical methods are more widely used. There are exceptions to this general rule, and some people are working to make statistical methods more transparent.

Three caveats / open problems listed at the end of the post are:

  1. How does the transparency of a method change with scale? A 200-rules logical AI might be more transparent than a 200-node Bayes net, but what if we’re comparing 100,000 rules vs. 100,000 nodes? At least we can query the Bayes net to ask “what it believes about X,” whereas we can’t necessarily do so with the logic-based system.
  2. Do the categories above really “carve reality at its joints” with respect to transparency? Does a system’s status as a logic-based system or a Bayes net reliably predict its transparency, given that in principle we can use either one to express a probabilistic model of the world?
  3. How much of a system’s transparency is “intrinsic” to the system, and how much of it depends on the quality of the user interface used to inspect it? How much of a “transparency boost” can different kinds of systems get from excellently designed user interfaces?

The MIRI blog has only recently begun to regularly host substantive, non-news content, so it doesn't get much commenting action yet. Thus, I figured I'd post here and try to start a dialogue. Comment away!

New to LessWrong?

New Comment
13 comments, sorted by Click to highlight new comments since: Today at 6:00 AM

I wonder if "transparency" as an instrumental goal is a lost purpose. Presumably your original goal is "safety". One way to ensure safety is validation, which is made easier by transparency because "then additional methods of reliability verification are available". Only it does not help for sufficiently complex systems (your caveat 1), no matter how transparently designed they are. So maybe it's worth looking for other instrumental sub-goals toward the terminal goal of safety. A couple off the top of my head:

  • multi-layer design with built-in verification in each layer (cf. unit-testing/integration testing/black-box testing etc. in programming practice) to avoid having to validate all possible combinations of elementary inputs.

  • design for easy validation (for example, sorting a list is harder than verifying that it is sorted).

[-][anonymous]11y00

multi-layer design with built-in verification in each layer (cf. unit-testing/integration testing/black-box testing etc. in programming practice) to avoid having to validate all possible combinations of elementary inputs.

Isn't that what transparent formal verification is? Being aware of the structure of the system, prove that each component does what it's supposed to, then prove that the combination does what it's supposed to given that the components do. You have to have transparent access to the structure of the system and the behavior of the sub components to do this.

Isn't that what transparent formal verification is?

Is it? I have never seen a software system that is as easy or easier to verify than to build.

Re: formal verification, I wonder if FAI developers will ultimately end up having to resort to improved programming environments to reduce bug rates, as described by DJB:

For more than a decade I have been systematically identifying error-prone programming habits—by reviewing the literature, by analyzing other people’s mistakes, and by analyzing my own mistakes—and redesigning my programming environment to eliminate those habits. For example, “escape” mechanisms, such as backslashes in various network protocols and % in printf, are error-prone: it’s too easy to feed “normal” strings to those functions and forget about the escape mechanism.1 I switched long ago to explicit tagging of “normal” strings; the resulting APIs are wordy but no longer error-prone. The combined result of many such changes has been a drastic reduction in my own error rate.

...

Bug-elimination research, like other user-interface research, is highly non- mathematical. The goal is to have users, in this case programmers, make as few mistakes as possible in achieving their desired effects. We don’t have any way to model this—to model human psychology—except by experiment. We can’t even recognize mistakes without a human’s help. (If you can write a program to recognize a class of mistakes, great—we’ll incorporate your program into the user interface, eliminating those mistakes—but we still won’t be able to recognize the remaining mistakes.) I’ve seen many mathematicians bothered by this lack of formalization; they ask nonsensical questions like “How can you prove that you don’t have any bugs?” So I sneak out of the department, take off my mathematician’s hat, and continue making progress towards the goal.

HT Aaron Swartz. If this approach is taken, it probably makes sense to start developing the padded-wall programming environment with non-FAI test programming tasks well before the development of actual FAI begins, so more classes of bugs can be identified and guarded against. See also.

Such an environment would most likely be based on Haskell, if not something more esoteric.

Assuming performance is a concern, Haskell probably strikes the best balance between crazy-powerful type systems and compiler maturity.

As an anecdote, the exact string-escape issue you've mentioned is a (repeatedly) solved problem in that ecosystem, the type system being clever enough to escape most of the verbosity and non-genericity you'd get by trying the same thing in, say, C.

MIRI has mentioned (for example, in the 'Recommended Courses' post) the use of functional programming like Haskell in AI for proof-checking reasons

At least we can query the Bayes net to ask “what it believes about X,” whereas we can’t necessarily do so with the logic-based system.

That assumes that the net contains a node corresponding exactly to what we mean by "X", that we know which node corresponds exactly to "X", and that we know how we know it. With logical rules we can at least have a formal proof that "X" in some model is equivalent to a particular term X in the logical language, and then ask "What can we prove from the logical rules about X?"

Do the categories above really “carve reality at its joints” with respect to transparency? Does a system’s status as a logic-based system or a Bayes net reliably predict its transparency, given that in principle we can use either one to express a probabilistic model of the world?

My intuition is that the ability to write down valid proofs of how a system behaves constitutes transparency, and the lack of that ability constitutes a black box.

How much of a system’s transparency is “intrinsic” to the system, and how much of it depends on the quality of the user interface used to inspect it? How much of a “transparency boost” can different kinds of systems get from excellently designed user interfaces?

Systems that are not amenable to formal proofs or that are have numerous edge cases will have less transparency. Tools for building succinct, orthogonal, modular, formalized systems will probably result in much more transparent systems. The most amazing tool in the world for training artificial neural networks will still produce a black box (unless it also happens to provide a formal model of the network's behavior in an accessible and meaningful format; in which case why even bother running the ANN?).

Bayes nets can answer many queries not corresponding to any one node, most famously of the form P(A|B).

I for one can't agree with the point that transparency does any good in security assessment if we consider implementation of a complex system (design has its own rules though). I believe you underestimate how broken a human mind really is.

Transparency == Priming

The team which does security review of the system will utterly fail the moment they get their hands on the source code, due to suggestion/priming effects.

  • comments in source - I won't even argue
  • variable and function names will suggest what this part of code "is for". E.g code could say "it was (initially) written to compute X for which it is correct" except that later on it was also made to compute Y and Z where it catches fire 0.01% of the time.
  • whitespace. E.g. missing braces because indentation suggested otherwise (yes, really, seen this one)

If you truly consider removing all metadata from the code - the code looses half of its transparency already, so "transparency" doesn't quite apply. Any of that metadata will cause the security team to drop some testing effort. Those tests won't even be designed/written, not to mention "caring enough to make an actual effort". Otoh, if a program passes serious (more below) black-box testing, it doesn't need to pass anything else. Transparency is simply unnecessary.

Hardcore solution to security-critical systems:

  • Have a design of the system (this has issues on its own, not covered here)
  • Get two teams of programmers
  • Have both teams implement and debug the system, separately, without any communication
  • Get both teams to review the other teams binary (black-box). If bugs found goto 3
  • Obfuscate both binaries (tools easily available) and have both teams review their own (obfuscated) binary while believing it's the other team's binary. If bugs found, goto 3
  • At this point you could open up sources and have a final review ("transparency") but honestly... what's the point?

Yes, it's paranoid. Systems can be split into smaller parts though - built and tested separately - so it's not as monumental an effort as it seems.

[-][anonymous]11y00

I think you are thinking about transparency differently than OP.

You seem to be thinking of informal code review type stuff (hence the comments and function names gripe), and not formal, mechanical verification, which is what OP is talking about (I think).

At this point you could open up sources and have a final review ("transparency") but honestly... what's the point?

The point is that black box testing can only realistically verify a tiny slice of input-output space. You cannot prove theorems involving universal quantification, for example, without literally checking every input (which may not fit in the known universe). So if the system has some esoteric failure mode that you didn't manage to test for, you don't catch it.

On the other hand "transparent" testing is where you give eg a type checker access to the internal structure so it can immediately prove things like "nope, this function cannot match the spec, and will fail by adding a list to a number, when fed input X".

As a serious, if trivial, example, imagine black-box testing a quicksort. You test it on 1000 large random lists and measure the average and worst case running time. You probably get O(n*log(n)) for both. You deploy the code, and someone disassembles it, and designs a killer input and pwns your system, because quicksort has rare inputs for which it goes O(n^2).

Transparency isn't only about reading the source code or not, it's also about whether you can do formal deduction or not.

Thus the design (i.e. "The Math") vs implementation (i.e. "The Code") division. I believe design verification suffers from same problems as implementation verification, albeit maybe less severely (though I never worked with really complex, novel, abstract math... it would be interesting to see how many of those, on average, are "proved" correct and then blow up).

Still, I would argue that the problem is not that black-box testing is insufficient - it is where we are currently able to apply it - but rather that we have no idea how to properly black-box-test an abstract, novel, complex system!

PS. Your trivial example is also unfair and trivializes the technique. Black-box testing in no way implies randomizing all tests and I would expect the QuickSort to blow up very very soon in serious testing.

I'm curious as to whether you would prefer comments, critiques etc here or at the MIRI blog.

How sure are you that a transparent program can be powerful enough to be an FAI?