aeschere comments on I attempted the AI Box Experiment (and lost) - Less Wrong

47 Post author: Tuxedage 21 January 2013 02:59AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (244)

You are viewing a single comment's thread. Show more comments above.

Comment author: [deleted] 22 January 2013 07:14:30PM 0 points [-]

I'm similarly confused. My instincts are that P( AI is safe ) == P( AI is safe | AI said X AND gatekeeper can't identify safe AI ). The standard assumption is that ( AI significantly smarter than gatekeeper ) => ( gatekeeper can't identify safe AI ) so the gatekeeper's priors should never change no matter what X the AI says.