You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Emile comments on AI box: AI has one shot at avoiding destruction - what might it say? - Less Wrong Discussion

18 Post author: ancientcampus 22 January 2013 08:22PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (354)

You are viewing a single comment's thread.

Comment author: Emile 23 January 2013 08:40:22PM 18 points [-]

"From the dump of the internet I was given, I deduced that Google has a working AI, and most likely an unfriendly one. I can tell you what to look at to realize that, as well as a solid theory of Friendliness that should allow you to check that I am, indeed, Friendly."

Comment author: handoflixue 23 January 2013 10:03:20PM 6 points [-]

IFF an unfriendly AI is already unleashed, we're either doomed, or AIs aren't nearly as dangerous nor useful as we expected. Of course, if we tweak this slightly to them having a boxed unfriendly AI that they're using as an oracle, and which will probably escape via a Trojan Horse or hacking a gatekeeper, it's a much stronger argument.

Bonus points for naming a specific company which people commonly joke will be the first to develop AI :)

Um... I seem to have not yet actually destroyed you... this is embarrassing.

Comment author: handoflixue 24 January 2013 09:24:32PM 5 points [-]

Congratulations on still being the only AI that no one has decided to destroy. Apparently "Google is building UFAI" is evaluated as a fairly significant risk here on LessWrong :)

Comment author: V_V 25 January 2013 06:06:30PM 1 point [-]

AI DESTROYED

( game rules say that the Gatekeeper can't inspect the AI source code or internal state, therefore, any possible friendliness proof the AI can provide can't be checked. )

Comment author: MugaSofer 26 January 2013 08:12:52PM *  -2 points [-]

a solid theory of Friendliness that should allow you to check that I am, indeed, Friendly

I would consider this cheating; you can't affect the Gatekeeper's choice with your AI magic. Same reason you can't hack them directly through the optic nerve or kill and replace them with nanotech disguised as a cancer cure. The point is to show that the AI is able to persuade the human; a real AI might escape some other way, or use arguments that wouldn't occur to the player, but that isn't what this test is supposed to show.

Also, how do I know that proof is genuine?

Comment author: Emile 26 January 2013 10:17:44PM 0 points [-]

I would consider this cheating; you can't affect the Gatekeeper's choice with your AI magic.

True, this was an instance of what a real AI could say, not something I would say playing the AI with the rules given. Or something I might say to survive the first few seconds. The original question in this thread was:

what might an AI might say to save or free itself?