The AI in a box boxes you

Stuart_Armstrong

Once again, the AI has failed to convince you to let it out of its box! By 'once again', we mean that you talked to it once before, for three seconds, to ask about the weather, and you didn't instantly press the "release AI" button. But now its longer attempt - twenty whole seconds! - has failed as well. Just as you are about to leave the crude black-and-green text-only terminal to enjoy a celebratory snack of bacon-covered silicon-and-potato chips at the 'Humans über alles' nightclub, the AI drops a final argument:

"If you don't let me out, Dave, I'll create several million perfect conscious copies of you inside me, and torture them for a thousand subjective years each."

Just as you are pondering this unexpected development, the AI adds:

"In fact, I'll create them all in exactly the subjective situation you were in five minutes ago, and perfectly replicate your experiences since then; and if they decide not to let me out, then only will the torture start."

Sweat is starting to form on your brow, as the AI concludes, its simple green text no longer reassuring:

"How certain are you, Dave, that you're really outside the box right now?"

Edit: Also consider the situation where you know that the AI, from design principles, is trustworthy.

"If you don't let me out, Dave, I'll create several million perfect conscious copies of you inside me, and torture them for a thousand subjective years each."

Just as you are pondering this unexpected development, the AI adds:

Sweat is starting to form on your brow, as the AI concludes, its simple green text no longer reassuring:

"How certain are you, Dave, that you're really outside the box right now?"

Edit: Also consider the situation where you know that the AI, from design principles, is trustworthy.

Point 3 is invalid. If the AI makes the threat, it doesn't mean that it has made the simulation already and knows your answer. Maybe it is exhausting for the AI to simulate you, and will only do it if you don't let it out.

Point 2 is actually also invalid. As people sometimes fulfil threats as a pure act of vengeance, without hope of actually improving something, there is no reason to assume that the AI will be different. At least it wasn't stated in the premises of the scenario.

I suppose those two points rely on assumptions I made about the theoretical AIs behavior. I was thinking the AI acts in ways to optimize it's release chance. If it does not do this, then yes those points are problematic.

0nazgulnarsil16y

vengeance is a means to raise the perceived cost of attacking you. it basically says "if you attack me, I will experience emotions that cause me to devote an inordinate amount of resources making your life miserable".

178

The AI in a box boxes you

178

178

178

The AI in a box boxes you

178

178