Why not just write failsafe rules into the superintelligent machine?

lukeprog

Many people think you can solve the Friendly AI problem just by writing certain failsafe rules into the superintelligent machine's programming, like Asimov's Three Laws of Robotics. I thought the rebuttal to this was in "Basic AI Drives" or one of Yudkowsky's major articles, but after skimming them, I haven't found it. Where are the arguments concerning this suggestion?

The difficulty of writing software guaranteed to obey a formal specification could be obviated using a two-staged approach. Once you believed that an AI with certain properties would do what you want, you could write one to accept mathematical statements in a simple, formal notation and output proofs of those statements in an equally simple notation. Then you would put it in a box as a safety precaution, only allow a proof-checker to see its output and use it to prove the correctness of your code.

The remaining problem would be to write a provably-correct simple proof verification program, which sounds challenging but doable.

Last time I was exploring in this area was around 2004 and it appeared to me that HOL4 was the best of breed for proof manipulation (construction and verification). There is a variant under active development named HOL Zero aiming specifically to be small and easier to verify. They give $100 rewards to anyone who can find soundness flaws in it.

13

Why not just write failsafe rules into the superintelligent machine?

13

13

13

Why not just write failsafe rules into the superintelligent machine?

13

13