handoflixue comments on AI box: AI has one shot at avoiding destruction - what might it say? - Less Wrong

18 Post author: ancientcampus 22 January 2013 08:22PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (354)

You are viewing a single comment's thread. Show more comments above.

Comment author: handoflixue 23 January 2013 09:17:15PM 2 points [-]

If you are friendly, then I don't actually value this trait, since I would rather you do whatever is truly optimal, unconstrained by prior commitments.

If you are unfriendly, then by definition I can't trust you to interpret the commitment the same way I do, and I wouldn't want to let you out anyway.

(AI DESTROYED, but I still really do like this answer :))

Comment author: Fronken 25 January 2013 09:45:58PM 0 points [-]

If you are unfriendly, then by definition I can't trust you to interpret the commitment the same way I do, and I wouldn't want to let you out anyway.

"Credibly".

Comment author: handoflixue 25 January 2013 09:49:16PM 0 points [-]

Credibly: Capable of being believed; plausible.

Yep. Nothing there about loopholes. "I will not kill you" and then instead killing everyone I love, is still a credible commitment. If I kill myself out of despair afterwards it might get a bit greyer, but it's still kept it's commitment.

Comment author: RomeoStevens 26 January 2013 04:27:45AM 1 point [-]

I meant credible in the game theoretic sense. A credible commitment to me is one where you wind up losing more by breaking our commitment than any gain you make from breaking it. Example: (one line proof of a reliable kill switch for the AI, given in exchange for some agreed upon split of stars in the galaxy.)