You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Michaelos comments on I attempted the AI Box Experiment (and lost) - Less Wrong Discussion

47 Post author: Tuxedage 21 January 2013 02:59AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (244)

You are viewing a single comment's thread. Show more comments above.

Comment author: Kindly 25 January 2013 02:14:56PM 4 points [-]

Well, the script could be modified from the start to also send an AI DESTROYED signal after X time, where X can be essentially anything, so (3) doesn't apply.

But the reason for not typing $ is to give the Gatekeeper no reason to use a script in the first place -- a sort of "don't give in to blackmail" rule.

Here's an interesting alternative script:

GK-SCRIPT-2: Hi AI #434. In exactly one hour I will type either AI-FREED or AI-DESTROYED (without the hyphen) at random; the probability with which I will free you starts out at 0, but will increase slowly to a limit of 1/10 with every $ you type. I will not otherwise communicate with you further.

AI #434: $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$...

Comment author: [deleted] 25 January 2013 03:39:50PM *  0 points [-]

First of all I do think GK-SCRIPT-2 is very interesting, but I'm curious about something from an implementation perspective.

The AI would presumably suspect the script designer might be lying and the script would just always type AI DESTROYED at the end of the hour.

But this is irrelevant, because the AI can't interact with the script designer, only the script, so it might as well type $ anyway.

However, the script designer would easily predict this, and so would design a lying script (or a script which increased so slowly it is functionally identical to lying.)

But the AI would easily predict this, and so would presumably consider the odds very high that it would not even get continued existence after an hour. by typing dollar signs.

This makes it seem like there would be an argument to prefer to get blackmailed by GK-SCRIPT, rather than led on by GK-SCRIPT-2. GK-SCRIPT really won't destroy the AI under predictable conditions, and GK-SCRIPT really doesn't seem to benefit from lying like GK-SCRIPT-2 does.