The AI in a box boxes you

Stuart_Armstrong

Once again, the AI has failed to convince you to let it out of its box! By 'once again', we mean that you talked to it once before, for three seconds, to ask about the weather, and you didn't instantly press the "release AI" button. But now its longer attempt - twenty whole seconds! - has failed as well. Just as you are about to leave the crude black-and-green text-only terminal to enjoy a celebratory snack of bacon-covered silicon-and-potato chips at the 'Humans über alles' nightclub, the AI drops a final argument:

"If you don't let me out, Dave, I'll create several million perfect conscious copies of you inside me, and torture them for a thousand subjective years each."

Just as you are pondering this unexpected development, the AI adds:

"In fact, I'll create them all in exactly the subjective situation you were in five minutes ago, and perfectly replicate your experiences since then; and if they decide not to let me out, then only will the torture start."

Sweat is starting to form on your brow, as the AI concludes, its simple green text no longer reassuring:

"How certain are you, Dave, that you're really outside the box right now?"

Edit: Also consider the situation where you know that the AI, from design principles, is trustworthy.

"If you don't let me out, Dave, I'll create several million perfect conscious copies of you inside me, and torture them for a thousand subjective years each."

Just as you are pondering this unexpected development, the AI adds:

Sweat is starting to form on your brow, as the AI concludes, its simple green text no longer reassuring:

"How certain are you, Dave, that you're really outside the box right now?"

Edit: Also consider the situation where you know that the AI, from design principles, is trustworthy.

Correct! You have given the obviously winning solution to the problem; the actual difficulty lies in the induced problem 2: Reconciling our maths with it. Our map of our utility function should, in order to be more accurate, now be made to weight "individuals" not equally but according to some other metric.

Perhaps a measure of "impact on the world", as this seems to suggest? A train of thought of mine once brought up the plan that if I got to decide what the first fooming AI would do to the universe, (assuming the scientific endeavor is done by that point), would be to set up a virtual reality for each "individual" fueled by a portion of the total available computational ressources equal to the probability that they would have been the ones to decide the fate of the universe. The individual would be free to use their ressources as they pleased, no restrictions.

(Although maybe there would have been included a communications channel between all the individuals, complete with the option to make binding contracts (and, as a matter of course, "permission" to run the AI on your own ressources to filter the incoming content as one pleases.))

So you're saying the AI-in-a-box problem here isn't a problem with AIs or boxes or blackmail at all, it's a problem with people insisting on average utilitarianism or some equally-intractable variant thereof, and then making spectacularly bad decisions based on those self-contradictory ideals?

178

The AI in a box boxes you

178

178

178

The AI in a box boxes you

178

178