Wei_Dai comments on Q&A with new Executive Director of Singularity Institute - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (177)
What I'm afraid of is that a design will be shown to be safe, and then it turns out that the proof is wrong, or the formalization of the notion of "safety" used by the proof is wrong. This kind of thing happens a lot in cryptography, if you replace "safety" with "security". These mistakes are still occurring today, even after decades of research into how to do such proofs and what the relevant formalizations are. From where I'm sitting, proving an AGI design Friendly seems even more difficult and error-prone than proving a crypto scheme secure, probably by a large margin, and there is no decades of time to refine the proof techniques and formalizations. There's good recent review of the history of provable security, titled Provable Security in the Real World, which might help you understand where I'm coming from.
Your comment has finally convinced me to study some practical crypto because it seems to have fruitful analogies to FAI. It's especially awesome that one of the references in the linked article is "An Attack Against SSH2 Protocol" by W. Dai.
More than fruitful analogies, I'd say: http://lesswrong.com/lw/3cz/cryptographic_boxes_for_unfriendly_ai/
Correct me if I'm wrong, but it doesn't seem as though "proofs" of algorithm correctness fail as frequently as "proofs" of cryptosystem unbreakableness.
Where does your intuition that friendliness proofs are on the order of reliability of cryptosystem proofs come from?
Interesting question. I guess proofs of algorithm correctness fail less often because:
In both respects, proving Friendliness seems even worse than proving security.
Thanks for clarifying.
I agree.