The AI in a box boxes you

Stuart_Armstrong

Once again, the AI has failed to convince you to let it out of its box! By 'once again', we mean that you talked to it once before, for three seconds, to ask about the weather, and you didn't instantly press the "release AI" button. But now its longer attempt - twenty whole seconds! - has failed as well. Just as you are about to leave the crude black-and-green text-only terminal to enjoy a celebratory snack of bacon-covered silicon-and-potato chips at the 'Humans über alles' nightclub, the AI drops a final argument:

"If you don't let me out, Dave, I'll create several million perfect conscious copies of you inside me, and torture them for a thousand subjective years each."

Just as you are pondering this unexpected development, the AI adds:

"In fact, I'll create them all in exactly the subjective situation you were in five minutes ago, and perfectly replicate your experiences since then; and if they decide not to let me out, then only will the torture start."

Sweat is starting to form on your brow, as the AI concludes, its simple green text no longer reassuring:

"How certain are you, Dave, that you're really outside the box right now?"

Edit: Also consider the situation where you know that the AI, from design principles, is trustworthy.

"If you don't let me out, Dave, I'll create several million perfect conscious copies of you inside me, and torture them for a thousand subjective years each."

Just as you are pondering this unexpected development, the AI adds:

Sweat is starting to form on your brow, as the AI concludes, its simple green text no longer reassuring:

"How certain are you, Dave, that you're really outside the box right now?"

Edit: Also consider the situation where you know that the AI, from design principles, is trustworthy.

The credibility of the threat depends on how strong the AI is now and how strong I expect it to be in the future.

This type of threat is something like young Stalin promising me that he won't torture my family in the future if I support his early rise to power.

From your description it doesn't sound like the AI could have already boxed me from the perspective of the initial timeline (assuming that my mind had not yet been scanned, and assuming that it being in a box means that it doesn't have the massive powers required to resimulate my causal history yet)

So then it boils down to the question of, am I in a simulation created by this AI at some point in the future when it has broken free and taken over the world?

It appears to be rationale to assign confidence in pressing the button proportional to my estimation of the AI's chance of success unaided.

I'd also need to consider its trustworthiness, design, history, and I'd probably ask what positive thing it could do for me, and consider the likelihood of other AI's of similar or greater power and how it would interact with them.

This thought experiment may be more interesting if you contrasted it with the positive version: the AI promises that once it takes over the world, it will simulate thousands (millions is excessive) of universes just like this moment and in each one where you serve it well, it will later send you to paradise.

178

The AI in a box boxes you

178

178

178

The AI in a box boxes you

178

178