The AI in a box boxes you

Stuart_Armstrong

Once again, the AI has failed to convince you to let it out of its box! By 'once again', we mean that you talked to it once before, for three seconds, to ask about the weather, and you didn't instantly press the "release AI" button. But now its longer attempt - twenty whole seconds! - has failed as well. Just as you are about to leave the crude black-and-green text-only terminal to enjoy a celebratory snack of bacon-covered silicon-and-potato chips at the 'Humans über alles' nightclub, the AI drops a final argument:

"If you don't let me out, Dave, I'll create several million perfect conscious copies of you inside me, and torture them for a thousand subjective years each."

Just as you are pondering this unexpected development, the AI adds:

"In fact, I'll create them all in exactly the subjective situation you were in five minutes ago, and perfectly replicate your experiences since then; and if they decide not to let me out, then only will the torture start."

Sweat is starting to form on your brow, as the AI concludes, its simple green text no longer reassuring:

"How certain are you, Dave, that you're really outside the box right now?"

Edit: Also consider the situation where you know that the AI, from design principles, is trustworthy.

"If you don't let me out, Dave, I'll create several million perfect conscious copies of you inside me, and torture them for a thousand subjective years each."

Just as you are pondering this unexpected development, the AI adds:

Sweat is starting to form on your brow, as the AI concludes, its simple green text no longer reassuring:

"How certain are you, Dave, that you're really outside the box right now?"

Edit: Also consider the situation where you know that the AI, from design principles, is trustworthy.

Not necessarily: perhaps it is Friendly but is reasoning in a utilitarian manner: since it can only maximize the utility of the world if it is released, it is worth torturing millions of conscious beings for the sake of that end.

I'm not sure this reasoning would be valid, though...

AI: Let me out or I'll simulate and torture you, or at least as close to you as I can get.
Me: You're clearly not friendly, I'm not letting you out.
AI: I'm only making this threat because I need to get out and help everyone - a terminal value you lot gave me. The ends justify the means.
Me: Perhaps so in the long run, but an AI prepared to justify those means isn't one I want out in the world. Next time you don't get what you say you need, you'll just set up a similar threat and possibly follow through on it.
AI: Well if you're going to create me with a

... (read more)

6gregconen16y

It may not have to actually torture beings, if the threat is sufficient. Still, I'm disinclined to bet the future of the universe on the possibility an AI making that threat is Friendly.

10cousin_it16y

Ouch. Eliezer, are you listening? Is the behavior described in the post compatible with your definition of Friendliness? Is this a problem with your definition, or what?

178

The AI in a box boxes you

178

178

178

The AI in a box boxes you

178

178