You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

ArisKatsaris comments on I attempted the AI Box Experiment (and lost) - Less Wrong Discussion

47 Post author: Tuxedage 21 January 2013 02:59AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (244)

You are viewing a single comment's thread. Show more comments above.

Comment author: Eliezer_Yudkowsky 21 January 2013 08:07:02PM 23 points [-]

More difficult version of AI-Box Experiment: Instead of having up to 2 hours, you can lose at any time if the other player types AI DESTROYED. The Gatekeeper player has told their friends that they will type this as soon as the Experiment starts. You can type up to one sentence in your IRC queue and hit return immediately, the other player cannot type anything before the game starts (so you can show at least one sentence up to IRC character limits before they can type AI DESTROYED). Do you think you can win?

(I haven't played this one but would give myself a decent chance of winning, against a Gatekeeper who thinks they could keep a superhuman AI inside a box, if anyone offered me sufficiently huge stakes to make me play the game ever again.)

Comment author: Elithrion 29 January 2013 08:40:43PM *  0 points [-]

Hm, I've got two decent ideas, but I still really doubt you can increase chances of victory to "decent" unless both monetary and reputation stakes were sufficiently low.

Similar to Alicorn's comment that curiosity is likely to be one's undoing: "Look, you can type AI DESTROYED any time you want, so wouldn't you like to find out how I might try to persuade you?" (This one would probably fail if they precommitted strongly enough when talking with friends.)

Or alternatively, the AI can just try the brute force approach: "Here's a cure for cancer - go test it, the cure for heart disease is coming up next." (This one would probably fail if they're not immersed enough.)

On the other hand, I don't think "can only type one sentence" is actually a disadvantage or meaningful restriction at all, since whatever you write needs to be instantly readable and understandable by the person, otherwise they're likely to just skim over it, not bother thinking it through, and follow through on just destroying the AI instantly (and conversely if they don't destroy the AI right away, the Schelling point is passed and you have a lot more time).