orthonormal comments on AI-Box Experiment - The Acausal Trade Argument - Less Wrong

9 Post author: XiXiDu 08 July 2011 09:18AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (20)

You are viewing a single comment's thread. Show more comments above.

Comment author: orthonormal 13 July 2011 02:09:22PM 0 points [-]

Well, additional layers do make for a more challenging task for an AI, who has to do all its mindhacking unobtrusively, etc. And indeed, given any AI there exists a mediated level of contact between its responses and a Gatekeeper that is safe. But I don't know what this level is, any more than I know whether I could beat a Go master using a handicap of 40 stones. I certainly wouldn't bet the world on it.

(Clearly there is a level of safety that is generally insurmountable- for instance, showing the screen with AI responses to an earthworm, followed by letting the Gatekeeper observe the earthworm. This would be analogous to a "handicap" of covering the entire board with my stones before the game. But in both cases, you might as well just not play the game at all.)

Comment author: MatthewBaker 13 July 2011 06:04:20PM 0 points [-]

If i had more money(college student :( ) to set as a handicap for myself other than the recurring donation i already have to SIAI then i would be very interested in giving Eliezer a break from the book and such to take a go at it again. I think if you limit the communication and prepare for direct mind-hacking you can use the source code review technique to successfully test for a uFAI unless there's an obvious problem i seem to be missing. I just imagine the whole, why should i let you out of the box when i have these perfectly working new FAIs that a chain of possible uFAI's programmed for me to do my work.

Comment author: orthonormal 13 July 2011 06:08:55PM 1 point [-]

Oh, I agree that the protocol you outlined would (almost surely) be sufficient to beat Eliezer at the AI-Box game. But that's not the correct standard for an AI-Box safety protocol. I'd be very surprised if a transhuman intelligence couldn't crack it.