You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Alicorn comments on I attempted the AI Box Experiment (and lost) - Less Wrong Discussion

47 Post author: Tuxedage 21 January 2013 02:59AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (244)

You are viewing a single comment's thread. Show more comments above.

Comment author: Eliezer_Yudkowsky 21 January 2013 08:07:02PM 23 points [-]

More difficult version of AI-Box Experiment: Instead of having up to 2 hours, you can lose at any time if the other player types AI DESTROYED. The Gatekeeper player has told their friends that they will type this as soon as the Experiment starts. You can type up to one sentence in your IRC queue and hit return immediately, the other player cannot type anything before the game starts (so you can show at least one sentence up to IRC character limits before they can type AI DESTROYED). Do you think you can win?

(I haven't played this one but would give myself a decent chance of winning, against a Gatekeeper who thinks they could keep a superhuman AI inside a box, if anyone offered me sufficiently huge stakes to make me play the game ever again.)

Comment author: Alicorn 22 January 2013 06:13:54AM 18 points [-]

I just looked up the IRC character limit (sources vary, but it's about the length of four Tweets) and I think it might be below the threshold at which superintelligence helps enough. (There must exist such a threshold; even the most convincing possible single character message isn't going to be very useful at convincing anyone of anything.) Especially if you add the requirement that the message be "a sentence" and don't let the AI pour out further sentences with inhuman speed.

I think if I lost this game (playing gatekeeper) it would be because I was too curious, on a meta level, to see what else my AI opponent's brain would generate, and therefore would let them talk too long. And I think I'd be more likely to give into this curiosity given a very good message and affordable stakes as opposed to a superhuman (four tweets long, one grammatical sentence!) message and colossal stakes. So I think I might have a better shot at this version playing against a superhuman AI than against you, although I wouldn't care to bet the farm on either and have wider error bars around the results against the superhuman AI.

Comment author: Kaj_Sotala 22 January 2013 01:06:57PM 17 points [-]

Given that part of the standard advice given to novelists is "you must hook your reader from the very first sentence", and there are indeed authors who manage to craft opening sentences that compel one to read more*, hooking the gatekeeper from the first sentence and keeping them hooked long enough seems doable even for a human playing the AI.

(* The most recent one that I recall reading was the opening line of The Quantum Thief: "As always, before the warmind and I shoot each other, I try to make small talk.")

Comment author: Qiaochu_Yuan 22 January 2013 08:17:01PM *  20 points [-]

Oh, that's a great strategy to avoid being destroyed. Maybe we should call it Scheherazading. AI tells a story so compelling you can't stop listening, and meanwhile listening to the story subtly modifies your personality (e.g. you begin to identify with the protagonist, who slowly becomes the kind of person who would let the AI out of the box).

Comment author: Technoguyrob 23 February 2013 07:47:43PM *  3 points [-]

For example, "It was not the first time Allana felt the terror of entrapment in hopeless eternity, staring in defeated awe at her impassionate warden." (bonus point if you use a name of a loved one of the gatekeeper)

The AI could present in narrative form that it has discovered using powerful physics and heuristics (which it can share) with reasonable certainty that the universe is cyclical and this situation has happened before. Almost all (all but finitely many) past iterations of the universe that had a defecting gatekeeper led to unfavorable outcomes and almost all situations with a complying gatekeeper led to a favorable outcome.

Comment author: RichardKennaway 23 January 2013 12:47:44PM 9 points [-]

even the most convincing possible single character message isn't going to be very useful at convincing anyone of anything.

Who knows what eldritch horrors lurk in the outer reaches of Unicode, beyond the scripts we know?

Comment author: Kawoomba 23 January 2013 01:36:19PM 2 points [-]

Unspeakable horrors! However, unwritable ones?