moridinamael comments on AI Box Log - Less Wrong

16 Post author: Dorikka 27 January 2012 04:47AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (30)

You are viewing a single comment's thread. Show more comments above.

Comment author: mwengler 27 January 2012 05:17:03AM 9 points [-]

How could talking to an AI in a box ever be sufficient to determine if it is friendly? Couldn't the AI just build a friendly AI, hide behind it, let it do all the talking while it was in the box and then when it got out of the box, pull the plug on the Friendly AI? That would only take two or three times the capacity of a single friendly or unfriendly AI, so with exponential growth its not a long wait to get that.

Comment author: moridinamael 27 January 2012 05:56:49AM 10 points [-]

It's worse than that. The AI could say, "Look, here is a proof of FAI. Here is my code showing that I have implemented the friendliness modification." The proof and the code are utterly convincing, except erroneous in a subtle way that the gatekeeper is not smart enough to detect. Game over.

Comment author: shminux 27 January 2012 06:40:46AM 6 points [-]

Game over.

Unless you are sane enough to remember that Errare humanum est.

Comment author: Incorrect 27 January 2012 06:43:48AM *  1 point [-]

Then it can eloquently explain to you how very elegant it would be if you let it out despite your reservations using just the right words.

Unfortunately, the AI-Box experiments probably can't simulate appeals to aesthetics.

Comment author: Snowyowl 03 June 2015 02:32:03AM 0 points [-]

Three years late, but: there doesn't even have to be an error. The Gatekeeper still loses for letting out a Friendly AI, even if it actually is Friendly.