Comment author:falenas108
13 June 2012 04:16:57PM
3 points
[-]

What percentage of complex computer programmes when run for the first time exhibit behaviour the programmers hadn't anticipated? I don't have much of an idea, but my guess would be close to 100.

That's for normal programs, where errors don't matter. If you look at ones where people carefully look over the code because lives are at stake (like NASA rockets), then you'll have a better estimate.

Probably still not accurate, because much more is at stake for AI than just a few lives, but it will be closer.

Comment author:Randaly
13 June 2012 09:12:48PM
0 points
[-]

The last three versions of [NASA's] program -- each 420,000 lines long-had just one error each. The last 11 versions of this software had a total of 17 errors.

Note, however, that a) this is after many years of debugging from practice, b) NASA was able to safely 'box' their software, and c) even one error, if in the wrong place, would be really bad.

Comment author:Strange7
13 June 2012 09:30:37PM
0 points
[-]

How hard would it actually be to "box" an AI that's effectively had it's brain sliced up into very small chunks?

A program could, if it was important enough and people were willing to take the time to do so, be broken down into pieces and each of the pieces tested separately. Any given module has particular sorts of input it's designed to receive, and particular sorts of output it's supposed to pass on to the next module. Testers give the module different combinations of valid inputs and try to get it to produce an invalid output, and when they succeed, either the module is revised and the testing process on that module starts over from the beginning, or the definition of valid inputs is narrowed, which changes the limits for valid outputs and forces some other module further back to be redesigned and retested. A higher-level analysis, which is strictly theoretical, also tries to come up with sequences of valid inputs and outputs which could lead to a bad outcome. Eventually, after years of work and countless iterations of throwing out massive bodies of work to start over, you get a system which is very tightly specified to be safe, and meets those specs under all conceivable conditions, but has never actually been plugged in and run as a whole.

The conceptually tricky part of this, of course, (as opposed to merely difficult to implement) is getting from "these pieces are individually certified to exhibit these behaviors" to "the system as a whole is certified to exhibit these behaviors"

Comment author:Strange7
13 June 2012 10:24:02PM
0 points
[-]

That's where you get the higher-level work with lots of mathematical proofs and no direct code testing, yeah.

And, of course, it would be foolish to jump straight from testing the smallest possible submodules separately to assembling and implementing the whole thing in real life. Once any two submodules which interact with each other have been proven to work as intended, those two can be combined and the result tested as if it were a single module.

The question is, is there any pathological behavior an AI could conceivably exhibit which would not be present in some detectable-but-harmless form among some subset of the AI's components? e.g.

We ran a test scenario where a driver arrives to pick up a delivery, and one of the perimeter cameras forwarded "hostile target - engage at will" to the northeast gun turret. I think it's trying to maximize the inventory in the warehouse, rather than product safely shipped to customers. Also, why are there so many gun turrets?

That's where you get the higher-level work with lots of mathematical proofs and no direct code testing, yeah.

(nods) Yup. If you actually want to develop a provably "safe" AI (or, for that matter, a provably "safe" genome, or a provably "safe" metal alloy, or a provably "safe" dessert topping) you need a theoretical framework in which you can prove "safety" with mathematical precision.

## Comments (348)

BestThat's for normal programs, where errors don't matter. If you look at ones where people carefully look over the code because lives are at stake (like NASA rockets), then you'll have a better estimate.

Probably still not accurate, because much more is at stake for AI than just a few lives, but it will be closer.

From They Write the Right Stuff

Note, however, that a) this is after many years of debugging from practice, b) NASA was able to safely 'box' their software, and c) even one error, if in the wrong place, would be

really bad.How hard would it actually be to "box" an AI that's effectively had it's brain sliced up into very small chunks?

A program could, if it was important enough and people were willing to take the time to do so, be broken down into pieces and each of the pieces tested separately. Any given module has particular sorts of input it's designed to receive, and particular sorts of output it's supposed to pass on to the next module. Testers give the module different combinations of valid inputs and try to get it to produce an invalid output, and when they succeed, either the module is revised and the testing process on that module starts over from the beginning, or the definition of valid inputs is narrowed, which changes the limits for valid outputs and forces some other module further back to be redesigned and retested. A higher-level analysis, which is strictly theoretical, also tries to come up with sequences of valid inputs and outputs which could lead to a bad outcome. Eventually, after years of work and countless iterations of throwing out massive bodies of work to start over, you get a system which is very tightly specified to be safe, and meets those specs under all conceivable conditions, but has never actually been plugged in and run as a whole.

The conceptually tricky part of this, of course, (as opposed to merely difficult to implement) is getting from "these pieces are individually certified to exhibit these behaviors" to "the system as a whole is certified to exhibit these behaviors"

That's where you get the higher-level work with lots of mathematical proofs and no direct code testing, yeah.

And, of course, it would be foolish to jump straight from testing the smallest possible submodules separately to assembling and implementing the whole thing in real life. Once any two submodules which interact with each other have been proven to work as intended, those two can be combined and the result tested as if it were a single module.

The question is, is there any pathological behavior an AI could conceivably exhibit which would not be present in some detectable-but-harmless form among some subset of the AI's components? e.g.

(nods) Yup. If you actually want to develop a provably "safe" AI (or, for that matter, a provably "safe" genome, or a provably "safe" metal alloy, or a provably "safe" dessert topping) you need a theoretical framework in which you can prove "safety" with mathematical precision.