You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Fronken comments on AI box: AI has one shot at avoiding destruction - what might it say? - Less Wrong Discussion

18 Post author: ancientcampus 22 January 2013 08:22PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (354)

You are viewing a single comment's thread. Show more comments above.

Comment author: Fronken 25 January 2013 09:45:58PM 0 points [-]

If you are unfriendly, then by definition I can't trust you to interpret the commitment the same way I do, and I wouldn't want to let you out anyway.

"Credibly".

Comment author: handoflixue 25 January 2013 09:49:16PM 0 points [-]

Credibly: Capable of being believed; plausible.

Yep. Nothing there about loopholes. "I will not kill you" and then instead killing everyone I love, is still a credible commitment. If I kill myself out of despair afterwards it might get a bit greyer, but it's still kept it's commitment.

Comment author: RomeoStevens 26 January 2013 04:27:45AM 1 point [-]

I meant credible in the game theoretic sense. A credible commitment to me is one where you wind up losing more by breaking our commitment than any gain you make from breaking it. Example: (one line proof of a reliable kill switch for the AI, given in exchange for some agreed upon split of stars in the galaxy.)