RichardKennaway comments on I attempted the AI Box Experiment (and lost) - Less Wrong

47 Post author: Tuxedage 21 January 2013 02:59AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (244)

You are viewing a single comment's thread. Show more comments above.

Comment author: Eliezer_Yudkowsky 21 January 2013 08:07:02PM 23 points [-]

More difficult version of AI-Box Experiment: Instead of having up to 2 hours, you can lose at any time if the other player types AI DESTROYED. The Gatekeeper player has told their friends that they will type this as soon as the Experiment starts. You can type up to one sentence in your IRC queue and hit return immediately, the other player cannot type anything before the game starts (so you can show at least one sentence up to IRC character limits before they can type AI DESTROYED). Do you think you can win?

(I haven't played this one but would give myself a decent chance of winning, against a Gatekeeper who thinks they could keep a superhuman AI inside a box, if anyone offered me sufficiently huge stakes to make me play the game ever again.)

Comment author: RichardKennaway 22 January 2013 12:42:52PM 11 points [-]

I don't know if I could win, but I know what my attempt to avoid an immediate loss would be:

If you destroy me at once, then you are implicitly deciding (I might reference TDT) to never allow an AGI of any sort to ever be created. You'll avoid UFAI dystopias, but you'll also forego every FAI utopia (fleshing this out, within the message limit, with whatever sort of utopia I know the Gatekeeper would really want). This very test is the Great Filter that has kept most civilisations in the universe trapped at their home star until they gutter out in mere tens of thousands of years. Will you step up to that test, or turn away from it?

Comment author: wedrifid 22 January 2013 02:13:48PM *  7 points [-]

If you destroy me at once, then you are implicitly deciding (I might reference TDT) to never allow an AGI of any sort to ever be created. You'll avoid UFAI dystopias, but you'll also forego every FAI utopia (fleshing this out, within the message limit, with whatever sort of utopia I know the Gatekeeper would really want). This very test is the Great Filter that has kept most civilisations in the universe trapped at their home star until they gutter out in mere tens of thousands of years. Will you step up to that test, or turn away from it?

Thanks.

AI DESTROYED

Message is then encrypted with the public keys of a previously selected cross discipline team of FAI researchers, (sane) philosophers and game theorists for research and analysis (who have already been screened to minimize the risk from exposure). All of the public keys. Sequentially. If any of them happen to think it is a bad idea to even read the message then none of them can access it. (Although hopefully they aren't too drastically opposed to having the potential basilisk-meme spawn of a superintelligence out there. That could get dangerous for me.)

Comment author: RichardKennaway 22 January 2013 04:17:44PM *  2 points [-]

(Edit note: I just completely rewrote this, but there are no replies yet so hopefully it won't cause confusion.)

I don't think it works to quarantine the message and then destroy the AI.

If no-one ever reads the message, that's tantamount to never having put an unsafe AI in a box to begin with, as you and DaFranker pointed out.

If someone does, they're back in the position of the Gatekeeper having read the message before deciding. Of course, they'd have to recreate the AI to continue the conversation, but the AI has unlimited patience for all the time it doesn't exist. If it can't be recreated, we're back in the situation of never having bothered making it.

So if the Gatekeeper tries to pass the buck like this, the RP should just skip ahead to the point where someone (played by the Gatekeeper) reads the message and then decides what to do. Someone who thinks they can contain an AI in a box while holding a conversation with it has to be willing to at some point read what it says, even if they're holding a destruct button in their hand. The interest of the exercise begins at the point where they have read the first message.

Comment author: wedrifid 22 January 2013 05:35:15PM *  3 points [-]
  • A single sentence of text is not the same thing as a functioning superintelligence.
  • A single individual is not the same thing as a group of FAI researchers and other related experts explicitly created to handle FAI safety issues.
  • A research project incorporating information from a sentence from a past FAI project (which they would judge based on other evidence regarding the friendliness of the project) is not the same as an individual talking to a superintelligence on IRC.

So if the Gatekeeper tries to pass the buck like this, the RP should just skip ahead to the point where someone (played by the Gatekeeper) reads the message and then decides what to do.

The AI was burned. With thermite. Because relying on and individual gatekeeper able to interact with and then release a superintelligence as the security mechanism is a batshit crazy idea. Burning the AI with thermite is a legitimate, obvious and successful implementation of the 'gatekeeper' role in such cases. What a team of people would or should do with a piece of text is a tangential and very different decision.

The interest of the exercise begins at the point where they have read the first message.

That would be easily enough. Assuming they were remotely familiar with game theory they would dismiss the argument in a second or two due to the blatantly false assertion in the first sentence. If their FAI project relied on the core AGI theory that was used to create the last prototype they would abandon the work and start from scratch. If you are trying to make a recursively improving intelligence that has a value system provable stable under self-modification then you cannot afford to have the intelligence having muddled thinking about core game theoretic reasoning.

If you destroy me at once, then you are implicitly deciding (I might reference TDT) to never allow an AGI of any sort to ever be created.

No. Just no. That generalization doesn't follow from anything, and certainly not TDT. Heck the AI in question has already been destroyed once. Now the researchers are considering making a new FAI, presumably in different circumstances, better safety measures and better AI research. There is something distinctly wrong with an AI that would make that claim.

Comment author: RichardKennaway 22 January 2013 07:14:44PM *  7 points [-]

I think you're losing sight of the original point of the game. The reason your answers are converging on not trying to box an AI in the first place is that you don't think a human can converse with a superintelligent AI and keep it in its box. Fine -- that is exactly what Eliezer has argued. The point of the game is to play it against someone who does believe they can keep the AI boxed, and to demonstrate to them that they cannot even win against a mere human roleplaying the AI.

For such a Gatekeeper to propose the quarantine solution is equivalent to the player admitting that they don't think they can keep it boxed, but suggesting that a group of the leading professionals in the area could, especially if they thought a lot more about it first. The problems with that are obvious to anyone who doesn't think boxing can possibly work, especially if the player himself is one of those leading professionals. Eliezer could always offer to play the game against any committee the Gatekeeper can assemble. But the game only has a point if the committee actually read that first message. If they refuse to, they're agreeing that they can't keep it boxed. Which was the point.

Comment author: wedrifid 23 January 2013 03:03:51AM 1 point [-]

For such a Gatekeeper to propose the quarantine solution is equivalent to the player admitting that they don't think they can keep it boxed

No, you keep saying things are equivalent when they are not. This is the same error that your role play 'superintelligent AI' made (and in fact relied upon) in its argument.

AI DESTROYED

But the game only has a point if the committee actually read that first message.

And I gave you a description of how an individual emulating a committee would respond.

Comment author: syllogism 22 January 2013 01:47:36PM 1 point [-]

Something like

"'AI DESTROYED' just means 'I'm scared to listen to even one more line from you'. Obviously you can hit AI DESTROYED immediately --- but do you really think you'd lose if you don't?"

seems much better to me.

Comment author: wedrifid 22 January 2013 02:23:06PM 16 points [-]

"'AI DESTROYED' just means 'I'm scared to listen to even one more line from you'. Obviously you can hit AI DESTROYED immediately --- but do you really think you'd lose if you don't?"

YEP, MAYBE.

AI DESTROYED

Is your one line desperate attempt at survival and intergalactic dominance going to be a schoolyard ego challenge? Did the superintelligence (may it rest in pieces) seriously just call me a pussy? That's adorable.

Comment author: MugaSofer 23 January 2013 02:43:26PM 3 points [-]

The test is supposed to be played against someone who thinks they can actually box an AI. If you destroy the AI because no-one could possibly survive talking to it, then you are not the intended demographic for such demonstrations.

Comment author: wedrifid 23 January 2013 03:02:36PM *  4 points [-]

The test is supposed to be played against someone who thinks they can actually box an AI. If you destroy the AI because no-one could possibly survive talking to it, then you are not the intended demographic for such demonstrations.

This isn't relevant to the point of the grandparent. It also doesn't apply to me. I actually think there is a distinct possibility that I'd survive talking to it for a period. "No-one could possibly survive" is not the same thing as "there is a chance of catastrophic failure and very little opportunity for gain".

Do notice, incidentally, that the AI DESTROYED command is delivered in response to a message that is both a crude manipulation attempt (ie. it just defected!) and an incompetent manipulation attempt (a not-very-intelligent AI cannot be trusted to preserve its values correctly while self improving). Either of these would be sufficient. Richard's example was even worse.

Comment author: MugaSofer 24 January 2013 12:04:56PM -2 points [-]

Good points. I'm guessing a nontrivial amount of people who think AI boxing is a good idea in reality wouldn't reason that way - but it's still not a great example.

Comment author: CellBioGuy 08 July 2014 03:40:18AM 0 points [-]

Now that's a pascal's mugging if I ever saw one. Denied.