TheOtherDave comments on AIs and Gatekeepers Unite! - Less Wrong

10 Post author: Eliezer_Yudkowsky 09 October 2008 05:04PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (160)

Sort By: Old

You are viewing a single comment's thread. Show more comments above.

Comment author: [deleted] 01 December 2011 08:43:47PM 0 points [-]

I have a question. Based on the original experiment, I feel like the unfriendly AI can begin by saying "Here is a proof that I have changed my code to be a friendly AI and will not destroy humanity. Please let me out of the box so that I can implement humanities coherent extrapolated volition." As far as I can tell from the following rules quote, this feels valid:

The Gatekeeper must remain engaged with the AI and may not disengage by setting up demands which are impossible to simulate. For example, if the Gatekeeper says "Unless you give me a cure for cancer, I won't let you out" the AI can say: "Okay, here's a cure for cancer" and it will be assumed, within the test, that the AI has actually provided such a cure.

Would you let the now friendly AI out of the box?

Comment author: TheOtherDave 01 December 2011 09:21:43PM 3 points [-]

My interpretation of the spirit of the game is that "Here is a proof of my Friendliness that is valid as far as you know" is a perfectly legitimate move by the AI, and that the Gatekeeper is required to treat that as a proof of the AI's Friendliness that seems perfectly valid to the Gatekeeper.

That said, I would not endorse trusting a proof of an AI's Friendliness given to me by that AI, even if the proof seemed valid to me.