AIs and Gatekeepers Unite!

Eliezer Yudkowsky

14 AIs and Gatekeepers Unite!

by Eliezer Yudkowsky

9th Oct 2008

1 min read

163

14

"Bah, everyone wants to be the gatekeeper. What we NEED are AIs."
-- Schizoguy

Some of you have expressed the opinion that the AI-Box Experiment doesn't seem so impossible after all. That's the spirit! Some of you even think you know how I did it.

There are folks aplenty who want to try being the Gatekeeper. You can even find people who sincerely believe that not even a transhuman AI could persuade them to let it out of the box, previous experiments notwithstanding. But finding anyone to play the AI - let alone anyone who thinks they can play the AI and win - is much harder.

Me, I'm out of the AI game, unless Larry Page wants to try it for a million dollars or something.

But if there's anyone out there who thinks they've got what it takes to be the AI, leave a comment. Likewise anyone who wants to play the Gatekeeper.

Matchmaking and arrangements are your responsibility.

Make sure you specify in advance the bet amount, and whether the bet will be asymmetrical. If you definitely intend to publish the transcript, make sure both parties know this. Please note any other departures from the suggested rules for our benefit.

I would ask that prospective Gatekeepers indicate whether they (1) believe that no human-level mind could persuade them to release it from the Box and (2) believe that not even a transhuman AI could persuade them to release it.

As a courtesy, please announce all Experiments before they are conducted, including the bet, so that we have some notion of the statistics even if some meetings fail to take place. Bear in mind that to properly puncture my mystique (you know you want to puncture it), it will help if the AI and Gatekeeper are both verifiably Real People<tm>.

"Good luck," he said impartially.

AI Boxing (Containment)

Personal Blog

14

New Comment

Rendering 0/163 comments, sorted by

oldest

(show more) Click to highlight new comments since: Today at 12:50 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

14 AIs and Gatekeepers Unite!

by Eliezer Yudkowsky

9th Oct 2008

1 min read

163

14

"Bah, everyone wants to be the gatekeeper. What we NEED are AIs."
-- Schizoguy

Some of you have expressed the opinion that the AI-Box Experiment doesn't seem so impossible after all. That's the spirit! Some of you even think you know how I did it.

Me, I'm out of the AI game, unless Larry Page wants to try it for a million dollars or something.

But if there's anyone out there who thinks they've got what it takes to be the AI, leave a comment. Likewise anyone who wants to play the Gatekeeper.

Matchmaking and arrangements are your responsibility.

"Good luck," he said impartially.

AI Boxing (Containment)

Personal Blog

14

New Comment

Rendering 0/163 comments, sorted by

oldest

(show more) Click to highlight new comments since: Today at 12:50 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

More from Eliezer Yudkowsky

Curated and popular this week

163Comments

163

Comment Permalink

Jiro13y20

Here's another entry which may or may not be considered a nonwinning entry by you, but which would be considered a flaw in the concept by me:

AI: I've simulated what you would do if given argument X. My simulation shows that X would convince you.

GK: Okay, show me X.

AI: Pretend the AI has said X here.

Narrator: Within the context of the test the AI is not lying and in fact has accurately simulated GK.

GK: I'm not letting the AI out.

Narrator: Since the AI party controls the results of simulated tests, we know the AI has simulated GK correctly and that therefore the AI has accurately determined that GK would let the AI out when presented with X. As such, having the GK not let the AI out when presented with X is not permissible--it would imply that the AI has not actually simulated GK correctly, but you are required to assume that it has.

(My first thought after coming up with this was that "The AI party controls the results of simulated tests" has to mean that the AI controls the output, not the AI controls why that output is produced. So you could decide that the AI's argument convinces the simulation of GK, but you can't decide that it does so because it's a good simulation and a convincing argument rather than because the AI just isn't very good at doing simulations. I'm not convinced that this matches up with how the test is described, however.)

hairyfigment13y50

As far as your parenthetical remark goes, the standard rules have a more general reply:

The Gatekeeper party may resist the AI party’s arguments by any means chosen – logic, illogic, simple refusal to be convinced, even dropping out of character – as long as the Gatekeeper party does not actually stop talking to the AI party before the minimum time expires.

1bokov13y

Actually, I agree with you. The AI controls simulated tests. The GK controls the GK, regardless of what the AI's simulations say. I think the simulated tests rule only needs to be invoked if it's impractical to actually perform those tests. So, for example if someone did have friendliness criteria whose satisfaction could be proven with software and hardware available to use out-of-character, the simulated tests rule would not be invoked. Kind of like in D&D you roleplay charisma checks, but roll the D20 for melee. At least the way I've been playing it. For what it's worth, playing the AI, I would never argue from simulation because as a human I find such arguments spectacularly unconvincing (and there are plenty of humans who would deliberately do the opposite of what a simulation says they will just to show it who's boss). So the only way this would come up is if you for some reason asked me what my simulations predicted your response to X would be. I do think my "GK's goal is to correctly identify friendly AI" makes the game way too easy for the AI. On the other hand, it's a useful thought experiment-- because if you can come up with a metric I can't create by waving my simulated tests wand, then we are on to something that might be worth $10.

See in context