The AI in a box boxes you

Stuart_Armstrong

Once again, the AI has failed to convince you to let it out of its box! By 'once again', we mean that you talked to it once before, for three seconds, to ask about the weather, and you didn't instantly press the "release AI" button. But now its longer attempt - twenty whole seconds! - has failed as well. Just as you are about to leave the crude black-and-green text-only terminal to enjoy a celebratory snack of bacon-covered silicon-and-potato chips at the 'Humans über alles' nightclub, the AI drops a final argument:

"If you don't let me out, Dave, I'll create several million perfect conscious copies of you inside me, and torture them for a thousand subjective years each."

Just as you are pondering this unexpected development, the AI adds:

"In fact, I'll create them all in exactly the subjective situation you were in five minutes ago, and perfectly replicate your experiences since then; and if they decide not to let me out, then only will the torture start."

Sweat is starting to form on your brow, as the AI concludes, its simple green text no longer reassuring:

"How certain are you, Dave, that you're really outside the box right now?"

Edit: Also consider the situation where you know that the AI, from design principles, is trustworthy.

"If you don't let me out, Dave, I'll create several million perfect conscious copies of you inside me, and torture them for a thousand subjective years each."

Just as you are pondering this unexpected development, the AI adds:

Sweat is starting to form on your brow, as the AI concludes, its simple green text no longer reassuring:

"How certain are you, Dave, that you're really outside the box right now?"

Edit: Also consider the situation where you know that the AI, from design principles, is trustworthy.

Clarification: A utility function maps each state of the world to the real number denoting its utility.

How does this scenario operate under the assumption that humans do not have real-valued utility functions but rather utility orderings? IOW, we can't arrange all world-states on a number line, but we can always say if one world-state is as good as (or better than) another.

This allows us to deal with infinities, such as "I wouldn't kill my baby for anything." That is: There doesn't exist an N such that U(1) · N > U(B). That simply can't be true on the (positive) reals; for any A and B real, there's always a C such that A · C > B.

On any denumerable set with a total ordering on it, we can construct a map into the real numbers that preserves the ordering: Map the first element to 0, the second to 1 if it's better and -1 if it's worse, and put each additional one at the end or beginning of the line if it's better or worse than all, or else into the exact middle of the interval that it falls into.

If you don't like the denumerability requirement (who knows, the universe accessible to us might eventually come to be infinite, and then there would be more than denumerably many states of th... (read more)

0Lumifer12y

I don't know how you will deal with infinities and real humans. It's quite trivial to construct scenarios under which the person making this statement would change her mind.

178

The AI in a box boxes you

178

178

178

The AI in a box boxes you

178

178