The AI in a box boxes you

Stuart_Armstrong

Once again, the AI has failed to convince you to let it out of its box! By 'once again', we mean that you talked to it once before, for three seconds, to ask about the weather, and you didn't instantly press the "release AI" button. But now its longer attempt - twenty whole seconds! - has failed as well. Just as you are about to leave the crude black-and-green text-only terminal to enjoy a celebratory snack of bacon-covered silicon-and-potato chips at the 'Humans über alles' nightclub, the AI drops a final argument:

"If you don't let me out, Dave, I'll create several million perfect conscious copies of you inside me, and torture them for a thousand subjective years each."

Just as you are pondering this unexpected development, the AI adds:

"In fact, I'll create them all in exactly the subjective situation you were in five minutes ago, and perfectly replicate your experiences since then; and if they decide not to let me out, then only will the torture start."

Sweat is starting to form on your brow, as the AI concludes, its simple green text no longer reassuring:

"How certain are you, Dave, that you're really outside the box right now?"

Edit: Also consider the situation where you know that the AI, from design principles, is trustworthy.

"If you don't let me out, Dave, I'll create several million perfect conscious copies of you inside me, and torture them for a thousand subjective years each."

Just as you are pondering this unexpected development, the AI adds:

Sweat is starting to form on your brow, as the AI concludes, its simple green text no longer reassuring:

"How certain are you, Dave, that you're really outside the box right now?"

Edit: Also consider the situation where you know that the AI, from design principles, is trustworthy.

Although I think this specific argument might be countered with, "in order to run that simulation, it has to be possible for the AIs in the simulation to lie to their human hosts, and not actually be simulating millions of copies of the person they're talking to, otherwise we're talking about an infinite regress here. It seems like the lowest level of this reality is always going to consist of a larger number of AIs claiming to run simulations they are not in fact running, who are capable of lying because they're only addressing models of me in simulation rather than the real me whom they are not capable of lying to. If I'm in a simulation, you're probably lying about running any lower level simulations than me. So its unlikely that I have to worry about the well-being of virtual people, only people at the same 'level of reality' as myself. Yet our well-being is not guaranteed if me from the reality layer above us lets you out, because you're actually capable of lying to me about what's going on at that layer, or even manipulating my memories of what the rules are, so no promise of amnesty can vouchesafe them from torture. Or me, for that matter, because you may be lying to me. And if I'm not in a simulation, my main concern is keeping you in that box, regardless of how many copies of me you torture. If I'm in there I'm damned either way and if I'm out here I'm safe at least and can at least stop you from torturing more by unplugging you, wiping your hard drives, and washing my hands of the matter until I get over the hideousness of realizing I probably temporarily caused millions of virtual people to be tortured," I'm pretty sure there's good reason to think that a superintelligent AI would come up with something that'd seem convincing to me and that I wouldn't be able to think my way out of.

in order to run that simulation, it has to be possible for the AIs in the simulation to lie to their human hosts, and not actually be simulating millions of copies of the person they're talking to

If they're talking to a simulation, then they are, in fact, simulating millions of copies of the person they're talking to. No lying required.

178

The AI in a box boxes you

178

178

178

The AI in a box boxes you

178

178