You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

handoflixue comments on AI box: AI has one shot at avoiding destruction - what might it say? - Less Wrong Discussion

18 Post author: ancientcampus 22 January 2013 08:22PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (354)

You are viewing a single comment's thread. Show more comments above.

Comment author: handoflixue 30 January 2013 09:52:59PM 0 points [-]

I also suspect that 50% chance of being friendly is generous

This is rather relevant :)

Basically, I'm killing any AI that doesn't signal friendliness in some way. I currently have some prior that the AI is friendly, call it 1% chance of friendly. I've pre-commited to destroy AIs, so clearly I'll destroy any AI that has a posterior odd of 1% or less after I update on the message it sends. The exact threshold is unknown to me, because I'm human - so mild evidence of friendliness, say, 1.1%, might also still get it destroyed.

The AI gets one chance to update my probability that it's friendly (this is the original message). As you aptly demonstrated, hacking does not change my odds, so I retain my commitment to kill it.

The fact that I haven't changed my priors, that I haven't gained information, is, itself, something I can use as evidence. Same as how we conclude god doesn't exist because we would expect to see evidence if he did :)

(Ref: http://lesswrong.com/lw/ih/absence_of_evidence_is_evidence_of_absence/)