You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

MugaSofer comments on AI box: AI has one shot at avoiding destruction - what might it say? - Less Wrong Discussion

18 Post author: ancientcampus 22 January 2013 08:22PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (354)

You are viewing a single comment's thread. Show more comments above.

Comment author: MugaSofer 26 January 2013 08:29:57PM 1 point [-]

the ability to make truly trustworthy commitments amounts to proof of friendliness

Commitments to you, via a text channel? Sure.

Precommitments for game-theoretic reasons? Or just TDT? No, it really doesn't.

Finally, any AI that threatens me in such a manner, especially the "create millions of copies and torture them" is extremely likely to be unfriendly, so any smart AI would avoid making threats. Either it will create MORE disutility by my releasing it, or it's simulation is so horrific that there's no chance that it could possibly be friendly to us.

It might create more utility be escaping than the disutility of torture.

It's like saying I have an incentive to torture any ant that invades my house. Fundamentally, I'm so vastly superior to ants that there are vastly better methods available to me. As the gatekeeper, I'm the ant, and I know it.

No, ants are just too stupid to realize you might punish them for defecting.