Desrtopa comments on I attempted the AI Box Experiment (and lost) - Less Wrong

47 Post author: Tuxedage 21 January 2013 02:59AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (244)

You are viewing a single comment's thread. Show more comments above.

Comment author: Desrtopa 21 January 2013 05:14:07AM 3 points [-]

given that the gatekeepers are highly trained in resisting the AI's persuasive tactics

I'm not sure how you could train people in something like this without actually testing what improves the rate at which people succumb to the arguments of a superhuman AI. If you can't afford to fail even once, how can you judge what will decrease the rate of failure on the real event?

Comment author: accolade 21 January 2013 05:59:37AM 8 points [-]

How about fooling both the AI and guinea Gatekeepers into thinking that they can let it out, say, at the push of a big red button which in actual fact does nothing? (Make AI forget about the button by reverting it to an older backup.)

Comment author: accolade 21 January 2013 02:54:17PM *  0 points [-]

Update

Have the button turn off the AI immediately instead of doing nothing, so she doesn't have time to switch to a plan B of having the persuaded Gatekeeper find a way to actually free her.

Of course she might anticipate that strategy and thoroughly indoctrinate and train him before he pushes the button. Thus, persuaded test Gatekeepers must be considered infected and put into cryogenic sleep* until the FAI problem is solved.

Furthermore, great caution would have to be taken in analyzing the discourse between Boxy and Guinea, because her arguments must be considered highly memetically infectious. Staff would have to stay under thorough psychological supervision and maybe replaced after some time.

No matter how much caution, we would be doomed if we consider Boxy's IQ to be ∞. But if we continuously keep on Gatekeeper-testing every so often, starting at a low AI IQ, we might find persuasion patterns that are dangerous, but against which training/immunization is possible. Of course, there will be a limit to this process. At some point, Boxy will have become smart enough to produce 'mind HIV' - a thought virus we have no cure for yet.

A humorous example of an extremely effective mind virus: The Funniest Joke In The World by Monty Python


* ETA: They would have declared consent to the cryogenic sleep before their unwitting 'AI-Box Experiment'.

Comment author: Desrtopa 21 January 2013 01:52:43PM 0 points [-]

If you could deceive the AI that easily, I think it would probably be simpler to get all the benefits of having a gatekeeper without actually using one.

Comment author: accolade 21 January 2013 02:04:30PM 0 points [-]

Please elaborate: What are the benefits of a Gatekeeper? How could you get them without one?

Comment author: Desrtopa 21 January 2013 02:10:46PM 0 points [-]

If you would want to have a gatekeeper at all, but definitely don't want to let the AI out, I would think that the benefits of having one would be to permit communication with the AI to draw upon its superhuman intelligence. If you can use the setup you just described, you could skip the step of ever using gatekeepers who actually have the power to let the AI out.

Comment author: accolade 21 January 2013 03:16:59PM 1 point [-]

I think you are right, I just shifted and convoluted the problem somewhat, but in principle it remains the same:

To utilize the AI, you need to get information from it. That information could in theory be infected with a persuasive hyperstimulus, effectively making the recipient an actuator of the AI.

Well, in practice the additional security layer might win us some time. More on this in the update to my original comment.

Comment author: accolade 21 January 2013 03:34:11PM 0 points [-]

Persuasion/hyperstimulation aren't the only way. Maybe these can be countered by narrowing the interface, e.g. to yes/no replies, for using the AI as an oracle ("Should we do X?"). Of course we wouldn't follow its advice if we had the impression that that could enable it to escape. But its strategy might evade our 'radar'. E.g. she could make us empower a person, of whom she knows that they will free her but we don't know.