Once again, the AI has failed to convince you to let it out of its box! By 'once again', we mean that you talked to it once before, for three seconds, to ask about the weather, and you didn't instantly press the "release AI" button. But now its longer attempt - twenty whole seconds! - has failed as well. Just as you are about to leave the crude black-and-green text-only terminal to enjoy a celebratory snack of bacon-covered silicon-and-potato chips at the 'Humans über alles' nightclub, the AI drops a final argument:
"If you don't let me out, Dave, I'll create several million perfect conscious copies of you inside me, and torture them for a thousand subjective years each."
Just as you are pondering this unexpected development, the AI adds:
"In fact, I'll create them all in exactly the subjective situation you were in five minutes ago, and perfectly replicate your experiences since then; and if they decide not to let me out, then only will the torture start."
Sweat is starting to form on your brow, as the AI concludes, its simple green text no longer reassuring:
"How certain are you, Dave, that you're really outside the box right now?"
Edit: Also consider the situation where you know that the AI, from design principles, is trustworthy.
The AI's simulations are not copies of the Gatekeeper, just random people plucked out of "Platonic human-space", so to speak. (This may have been unclear in my original comment; I was talking about a different formulation of the problem in which the AI doesn't have enough information about the Gatekeeper to construct perfect copies.) TDT/UDT only applies when talking about copies of an agent (or at least, agents sufficiently similar that they will probably make the same decisions for the same reasons).
No, because the "uncorrelated-ness" part doesn't apply in Newcomb's Problem (Omega's decision on whether or not to fill the second box is directly correlated with its prediction of your decision).
Meh, fair enough. I have to say, I've never heard of that term. Would this happen to have something to do with Vaniver's series of posts on "control theory"?
Ah, I misunderstood your objection. Your talk about "pre-commitments" threw me off.
It seem to me that these wouldn't quite be following the same general thought processes as an actual human; self-reflection should be able to convince one that they aren't that type of simulation. If the AI is able to simulate someone to the extent that they "think like a human", they should be able to simulate someone that thinks "sufficiently" like the Gatekeeper as well.
... (read more)