Mindcrime - LessWrong

"Mindcrime" is Nick Bostrom's suggested term for scenarios in which an AI's thought processes simulate human beings at sufficiently high fidelity for the simulations to themselves be conscious and objects of ethical value, or other scenarios in which the AI's thoughts contain sapient beings.

The most obvious way in which mindcrime could occur is if an pressure to produce maximally good predictions about human beings results in hypotheses and simulations so fine-grained and detailed that they are themselves people (conscious, sapient, objects of ethical value) even if they are not necessarily the same people. If you're happy with a very loose model of an airplane, it might be enough to know how fast it flies, but if you're engineering airplanes or checking their safety, you would probably start to simulate possible flows of air over the wings. It probably isn't necessary to go all the way down to the neural level to create a sapient being, either - it might be that even with some parts of a mind considered abstractly, the remainder would be simulated in enough detail to imply sapience. It'd help if we knew what the necessary and/or sufficient conditions for sapience were, but the fact that we don't know this doesn't mean that we can thereby conclude that any particular simulation is not sapient. (This would be ad ignorantiem.)

The agent's attempts to model and predict a human who is suffering, or who might possibly be suffering, could then create a simulated person (even if not the same person) who would actually experience that suffering. When the simulation stops, this would kill the simulated person (a bad event under many ethical systems even if the simulated person was happy).

Besides problems that are directly or obviously about modeling people, many other practical problems and questions can benefit from modeling other minds - e.g., reading the directions on a toaster oven in order to discern the intent of the mind that was trying to communicate how to use a toaster. Thus, mindcrime might result from a sufficiently powerful AI trying to solve very mundane problems as optimally as possible.

Other possible sources of mindcrime disasters would include: - Trying to distant superintelligences or their origins - Trying to human volitions, in a preference framework that calls for such. - Being instructed by humans, or otherwise forming a goal, of creating an avatar that exhibits 'realistic' behavior. - The AI considering many hypothetical future models of itself, if AI itself is conscious.

Since Superintelligences could potentially have a lot of computing power (especially if they have expanded onto infrastructure) there is the potential for mindcrime accidents of this type to involve more simulated people than have existed throughout human history to date. This would not be an disaster since it would not (by hypothesis) wipe out our posterity and our intergalactic future, but it could be a disaster orders of magnitude larger than, say, the Holocaust, the Mongol Conquest, the Middle Ages, or all human tragedy to date.

Three possible research avenues for preventing mindcrime are as follows:

Try to create a AI that not model other minds except relative to some very narrow class of permitted agent models that we are pretty sure are not sapient/sentient. This avenue is potentially motivated for other reasons as well, such as Avoiding Christiano's hack and averting deception.

Try to define a predicate that returns 1 for all people and many nonpeople, and returns 0 only for nonpeople (has no false negatives).

Try to finish the philosophical problem of understanding which causal processes experience sapience (or are otherwise objects of ethical value) in the next couple of decades, to sufficient detail that it can be explained to an AI.

Among other properties, the problem of mindcrime is distinguished by the worry that we can't ask an AI to solve it for us without already committing the disaster. In other words, if we ask an AI to predict what we would say if we had a thousand years to think about the problem of defining personhood or figuring out which causal processes are 'conscious', this seems exceptionally likely to cause the AI to commit mindcrime in the course of answering the question. Even asking the AI to think abstractly about the problem of consciousness, or predict by abstract reasoning what humans might say about it, seems exceptionally likely to result in mindcrime. There thus exists a order problem preventing us from asking the AI to solve the problem for us, since to file this request safely and without committing mindcrime, we would need the request to already have been completed.