CarlShulman comments on Q&A with new Executive Director of Singularity Institute - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (177)
If we condition on having all other variables optimized, I'd expect a team to adopt very high standards of proof, and recognize limits to its own capabilities, biases, etc. One of the primary purposes of organizing a small FAI team is to create a team that can actually stop and abandon a line of research/design (Eliezer calls this "halt, melt, and catch fire") that cannot be shown to be safe (given limited human ability, incentives and bias). If that works (and it's a separate target in team construction rather than a guarantee, but you specified optimized non-talent variables) then I would expect a big shift of probability from "UFAI" to "null."
What I'm afraid of is that a design will be shown to be safe, and then it turns out that the proof is wrong, or the formalization of the notion of "safety" used by the proof is wrong. This kind of thing happens a lot in cryptography, if you replace "safety" with "security". These mistakes are still occurring today, even after decades of research into how to do such proofs and what the relevant formalizations are. From where I'm sitting, proving an AGI design Friendly seems even more difficult and error-prone than proving a crypto scheme secure, probably by a large margin, and there is no decades of time to refine the proof techniques and formalizations. There's good recent review of the history of provable security, titled Provable Security in the Real World, which might help you understand where I'm coming from.
Your comment has finally convinced me to study some practical crypto because it seems to have fruitful analogies to FAI. It's especially awesome that one of the references in the linked article is "An Attack Against SSH2 Protocol" by W. Dai.
More than fruitful analogies, I'd say: http://lesswrong.com/lw/3cz/cryptographic_boxes_for_unfriendly_ai/
Correct me if I'm wrong, but it doesn't seem as though "proofs" of algorithm correctness fail as frequently as "proofs" of cryptosystem unbreakableness.
Where does your intuition that friendliness proofs are on the order of reliability of cryptosystem proofs come from?
Interesting question. I guess proofs of algorithm correctness fail less often because:
In both respects, proving Friendliness seems even worse than proving security.
Thanks for clarifying.
I agree.