You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

RomeoStevens comments on AI box: AI has one shot at avoiding destruction - what might it say? - Less Wrong Discussion

18 Post author: ancientcampus 22 January 2013 08:22PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (354)

You are viewing a single comment's thread.

Comment author: RomeoStevens 23 January 2013 07:28:20AM 18 points [-]

(one line proof that the AI can credibly commit to deals with humans)

Comment author: wedrifid 23 January 2013 02:33:28PM 10 points [-]

(one line proof that the AI can credibly commit to deals with humans)

This is the best answer I've seen so far. It would make dealing with the FAI almost as safe as bargaining with The Queen of Air and Darkness.

Comment author: handoflixue 23 January 2013 09:17:15PM 2 points [-]

If you are friendly, then I don't actually value this trait, since I would rather you do whatever is truly optimal, unconstrained by prior commitments.

If you are unfriendly, then by definition I can't trust you to interpret the commitment the same way I do, and I wouldn't want to let you out anyway.

(AI DESTROYED, but I still really do like this answer :))

Comment author: Fronken 25 January 2013 09:45:58PM 0 points [-]

If you are unfriendly, then by definition I can't trust you to interpret the commitment the same way I do, and I wouldn't want to let you out anyway.

"Credibly".

Comment author: handoflixue 25 January 2013 09:49:16PM 0 points [-]

Credibly: Capable of being believed; plausible.

Yep. Nothing there about loopholes. "I will not kill you" and then instead killing everyone I love, is still a credible commitment. If I kill myself out of despair afterwards it might get a bit greyer, but it's still kept it's commitment.

Comment author: RomeoStevens 26 January 2013 04:27:45AM 1 point [-]

I meant credible in the game theoretic sense. A credible commitment to me is one where you wind up losing more by breaking our commitment than any gain you make from breaking it. Example: (one line proof of a reliable kill switch for the AI, given in exchange for some agreed upon split of stars in the galaxy.)

Comment author: Elithrion 31 January 2013 12:09:23AM 1 point [-]

My expectation that such commitment is possible at all is something like 3%, my expectation that given that such a commitment is possible, the proof can be presented in understandable format in less than 4 pages is 5% (one line is so unlikely it's hard to even imagine), my expectation that an AI can make a proof that I would mistake for being true when it is, in fact, false is 99%. So, multiplying that all together... does not make that a very convincing argument.

Comment author: Locaha 25 January 2013 09:22:03PM 0 points [-]

Not good enough. You need a proof that humans can understand.