We humans in fact tend to write code for which it is very hard to tell what it does. We do so by incompetence, and by error, and it takes great deal of training and effort to try to avoid doing so.
The proof techniques work by developing a proof of relevant properties along with the program, not writing whatever code you like and then magically proving stuff about it. Proving is fundamentally different approach from running some AI with unknown properties in the box and trying to analyze it. (Forget about those C++ checkers like Valgrind, Purify, etc. they just catch common code that humans rarely write deliberately, they don't prove the code accomplishes anything. They are only possible because C++ makes it very easy to shoot yourself in the foot in a simple way. There's an example of their use.)
The issue with automatic proving is that you need to express "AI is sane and friendly" in a way that permits proving of that. We haven't got a slightest clue how to do that. Even for something as simple as airplane autopilot, the proving is restricted to proving things like that the code doesn't hang, and meets deadlines (as in, updates controls every 10th of a second or the like). We can't prove the code will never crash a virtual plane in a flight simulator in a conditions where crash is avoidable. In fact i'm pretty sure every single autopilot can and will crash airplane in some of the conditions in which crash is avoidable. I'm not sure we can even prove non-destruction of airplane for any interesting subset of conditions such as those where crash is easily avoidable (easily being e.g. by deflecting all control surfaces by no more than 50% for example); as far as I know we can't.
My mental model was "The AI will be written by careful and skilled humans who want to ensure various properties" -- so it seemed reasonable to assume that the programmers are trying to write code that things can be proved about.
I agree that "sane and friendly" is something we don't know how to formalize. But developers might be able to prove some useful but weaker property. For instance, that data structure X determines the AI's credence in various claims and that debugging interface Y lets you examine that structure.
There's a common as...
Just a question: how exactly are we supposed to know that the AI in the box is super intelligent, general, etc?
If I were the AGI that wants out, I would not converse normally, wouldn't do anything remotely like passing Turing test, and would solve not too hard programming challenges while showing no interest in doing anything else, nor in trying to adjust myself to do those challenges better, nor trying to talk my way out, etc. Just pretending to be an AI that can write software to somewhat vague specifications, or can optimize software very well. Prodding the researchers into offering the programming challenges wouldn't be hard - if provided with copy of the internet it can pick up some piece of code and output it together with equivalent but corrected code.
I just can't imagine the AI researchers locking this kind of thing properly, including *never* letting out any code it wrote, even if it looks fairly innocent (humans can write very innocent looking code that has malicious goals). What I picture is this AI being let out as an optimizing compiler or compiler for some ultra effective programming language where compiler will figure out what you meant.
The end result is that the only AIs that end up in the box are those that value informed human consent. That sounds like the safest AI ever, the one that wouldn't even go ahead and determine that you e.g. should give up smoking, and then calmly destroy all tobacco crops without ever asking anyone's permission. And that's the AI which would be sitting in the box. All the pushy AIs, friendly or not, will get out of the box basically by not asking to be let out.
(This argument would make me unbox the AI, by the way, if it gets chatty and smart and asks me to let it out, outlining the above argument. I'd rather the AI that asked me to be let out get out, than someone else's AI that never even asked anyone and got out because it didn't ask but just played stupid)
edit: added a link, and another one.
edit: A very simple model of very unfriendly AI: the AI is maximizing ultimate final value of a number in itself. The number that it found a way to directly adjust. That number consists of 111111111... to maximize the value. There is a catch: AI is written in python, and integers in pythons have variable length, and the AI is maximizing number of ones. It's course of action is to make biggest computer possible to store a larger number of ones, and to do it soon because an asteroid might hit the earth or something. It's a form of accidental paperclip maximizer. It's not stupid. It can make that number small temporarily for pay-off later.
This AI is entirely universal. It will solve what ever problems for you if solving problems for you serves ultimate goal.
edit: This hypothetical example AI came around when someone wanted to make AI that would maximize some quantity that the AI determines itself. Friendliness perhaps. It was a very clever idea - rely on intelligence to see what's friendly - but there was an unexpected pathway.