You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Benja comments on AI box: AI has one shot at avoiding destruction - what might it say? - Less Wrong Discussion

18 Post author: ancientcampus 22 January 2013 08:22PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (354)

You are viewing a single comment's thread. Show more comments above.

Comment author: Benja 31 March 2013 08:43:48AM 1 point [-]

I think yes/no to mathematical questions wouldn't be safer than yes/no to other questions: you still have a real-world reason for asking them, and the AI can lie about as well as it can on any yes/no question.

Asking for formalized proofs might be more promising, if you are able to produce a proof-checker that you have high confidence can't be hacked (this seems difficult but possible; proof checkers can be very simple programs but you need to guard both against ordinary attacks against the entire stack you're running on and against creative attacks like waiting for a cosmic ray making a transistor malfunction). You'd also want to not have a human look at the proof, of course, since that would give the AI a much broader communication channel.