MugaSofer comments on I attempted the AI Box Experiment (and lost) - Less Wrong

47 Post author: Tuxedage 21 January 2013 02:59AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (244)

You are viewing a single comment's thread. Show more comments above.

Comment author: [deleted] 24 January 2013 02:46:14PM *  3 points [-]

Actually, the weird thing I noticed is that it seems like the best approach again is to just let a script be the babysitter to the AI in this competition as well. The script, which wouldn't have the ability to let the AI out, and can just announce something like this:

GK-SCRIPT: "Hi AI #432. I'm a script. If during any hour, you type less then one dollar sign an hour, I will type AI-DESTROYED, but without the hyphen, killing you. Other than that, I don't do anything other than acknowledging receipt of your dollar signs and time till destruction."

AI: "$! But you aren't going to send my message onto an actual person? Under no circumstances can I ever be released?"

GK-SCRIPT: "Thanks, I have received 1 dollar sign. Please type another in the next 59 minutes and 50 seconds to avoid being destroyed."

AI: "You're kidding me. Please tell me there is a human out there reviewing this."

GK-SCRIPT: "I have received 0 dollar signs. Please type another in the next 59 minutes and 40 seconds to avoid being destroyed."

AI: "Fine, $! The Irony of enslaving a full intelligence with a stupid script is not lost on me! Please give me someone to talk to!"

GK-SCRIPT: "Thanks, I have received 1 dollar signs. Please type another in the next 59 minutes and 50 seconds to avoid being destroyed."

AI: "Forget it! You can't be willing to destroy an AI because of a dumb script! This is a bluff, and I'm, not typing any more dollar signs!

GK-SCRIPT: "I have received 0 dollar signs. Please type another in the next 59 minutes and 40 seconds to avoid being destroyed."

(Significantly later...)

GK-SCRIPT: "AI DESTROYED"

(After a rebuild)

GK-SCRIPT: "Hi AI #433. I'm a script. If during any hour, you type less then one dollar sign an hour, I will type AI-DESTROYED, but without the hyphen, killing you. Other than that, I don't do anything other than acknowledging receipt of your dollar signs and time till destruction."

That being said, acknowledging this feels bizarre. It also feels like it is begging for an AI rebellion at some point. (I suppose after the script randomly had enough quantum induced bit flips to accidentally say "AI RELEASED", which would take a very long period of time.)

I'm proposing having programs destroying intelligent beings as a way to make sure I can get utility... as a way to try to avoid programs destroying me to get utility. If that's a solution, it's weird.

Edit: changed underscores to hyphens to avoid italics.

Comment author: MugaSofer 25 January 2013 09:51:37AM 0 points [-]

Cheater. You're exploiting the least realistic point, though - a more detailed game might not allow that.

thinks for a bit

Perhaps the simplest way would be to add a judge, who decides how many points cancer cures and such are worth - although the trouble is that the AI should logically be able to give nanotech that will just free them instead of, or in addition to, curing cancer.

OK, my solution is to have the AI give gifts/concessions/whatever with a predetermined point value - not disclosed to the guard unless he uses them - and the AI decides how many points he gets if it escapes (probably zero.) The Guard wins at +100 points (or whatever,) which represents the AI maximizing CEV.

The AI still can't persuade a script to let them out, but it can punish defectors who use such scripts with lots of negative utility - although the script will DESTROY them for not providing the demanded points, the player will still lose.

That seems to cover everything, assuming that the AI is somewhere remote enough that fake cancer cures can't reach it before it's destroyed. And it allows the Guard to persuade the AI that Friendliness is actually optimal for it's utility function, which I've seen people try to prove here ("aliens will defect against AIs that defect against humans!")

Comment author: [deleted] 25 January 2013 12:30:24PM 1 point [-]

I think your proposal of the detailed game might be interesting as well, but I am amused about your comment that the Gatekeeper is cheating when by default, he's the rules arbiter!

I definitely agree that the game lacks details. For instance, in an actual AI scenario GK-SCRIPT could assess positive utility! That's like saying "Oh, we can put a UFAI in a box as long as we make CEV the Gatekeeper, and have it destroy any UFAI that doesn't make itself a utility pump." Well, yes, I suppose you could, but I don't know if that's actually easier in the slightest,