There is a subproblem of Friendly AI which is so scary that I usually don't talk about it, because very few would-be AI designers would react to it appropriately—that is, by saying, "Wow, that does sound like an interesting problem", instead of finding one of many subtle ways to scream and run away.
This is the problem that if you create an AI and tell it to model the world around it, it may form models of people that are people themselves. Not necessarily the same person, but people nonetheless.
If you look up at the night sky, and see the tiny dots of light that move over days and weeks—planētoi, the Greeks called them, "wanderers"—and you try to predict the movements of those planet-dots as best you can...
Historically, humans went through a journey as long and as wandering as the planets themselves, to find an accurate model. In the beginning, the models were things of cycles and epicycles, not much resembling the true Solar System.
But eventually we found laws of gravity, and finally built models—even if they were just on paper—that were extremely accurate so that Neptune could be deduced by looking at the unexplained perturbation of Uranus from its expected orbit. This required moment-by-moment modeling of where a simplified version of Uranus would be, and the other known planets. Simulation, not just abstraction. Prediction through simplified-yet-still-detailed pointwise similarity.
Suppose you have an AI that is around human beings. And like any Bayesian trying to explain its enivornment, the AI goes in quest of highly accurate models that predict what it sees of humans.
Models that predict/explain why people do the things they do, say the things they say, want the things they want, think the things they think, and even why people talk about "the mystery of subjective experience".
The model that most precisely predicts these facts, may well be a 'simulation' detailed enough to be a person in its own right.
A highly detailed model of me, may not be me. But it will, at least, be a model which (for purposes of prediction via similarity) thinks itself to be Eliezer Yudkowsky. It will be a model that, when cranked to find my behavior if asked "Who are you and are you conscious?", says "I am Eliezer Yudkowsky and I seem have subjective experiences" for much the same reason I do.
If that doesn't worry you, (re)read "Zombies! Zombies?".
It seems likely (though not certain) that this happens automatically, whenever a mind of sufficient power to find the right answer, and not otherwise disinclined to create a sentient being trapped within itself, tries to model a human as accurately as possible.
Now you could wave your hands and say, "Oh, by the time the AI is smart enough to do that, it will be smart enough not to". (This is, in general, a phrase useful in running away from Friendly AI problems.) But do you know this for a fact?
When dealing with things that confuse you, it is wise to widen your confidence intervals. Is a human mind the simplest possible mind that can be sentient? What if, in the course of trying to model its own programmers, a relatively younger AI manages to create a sentient simulation trapped within itself? How soon do you have to start worrying? Ask yourself that fundamental question, "What do I think I know, and how do I think I know it?"
You could wave your hands and say, "Oh, it's more important to get the job done quickly, then to worry about such relatively minor problems; the end justifies the means. Why, look at all these problems the Earth has right now..." (This is also a general way of running from Friendly AI problems.)
But we may consider and discard many hypotheses in the course of finding the truth, and we are but slow humans. What if an AI creates millions, billions, trillions of alternative hypotheses, models that are actually people, who die when they are disproven?
If you accidentally kill a few trillion people, or permit them to be killed—you could say that the weight of the Future outweighs this evil, perhaps. But the absolute weight of the sin would not be light. If you would balk at killing a million people with a nuclear weapon, you should balk at this.
You could wave your hands and say, "The model will contain abstractions over various uncertainties within it, and this will prevent it from being conscious even though it produces well-calibrated probability distributions over what you will say when you are asked to talk about consciousness." To which I can only reply, "That would be very convenient if it were true, but how the hell do you know that?" An element of a model marked 'abstract' is still there as a computational token, and the interacting causal system may still be sentient.
For these purposes, we do not, in principle, need to crack the entire Hard Problem of Consciousness—the confusion that we name "subjective experience". We only need to understand enough of it to know when a process is not conscious, not a person, not something deserving of the rights of citizenship. In practice, I suspect you can't halfway stop being confused—but in theory, half would be enough.
We need a nonperson predicate—a predicate that returns 1 for anything that is a person, and can return 0 or 1 for anything that is not a person. This is a "nonperson predicate" because if it returns 0, then you know that something is definitely not a person.
You can have more than one such predicate, and if any of them returns 0, you're ok. It just had better never return 0 on anything that is a person, however many nonpeople it returns 1 on.
We can even hope that the vast majority of models the AI needs, will be swiftly and trivially approved by a predicate that quickly answers 0. And that the AI would only need to resort to more specific predicates in case of modeling actual people.
With a good toolbox of nonperson predicates in hand, we could exclude all "model citizens"—all beliefs that are themselves people—from the set of hypotheses our Bayesian AI may invent to try to model its person-containing environment.
Does that sound odd? Well, one has to handle the problem somehow. I am open to better ideas, though I will be a bit skeptical about any suggestions for how to proceed that let us cleverly avoid solving the damn mystery.
So do I have a nonperson predicate? No. At least, no nontrivial ones.
This is a challenge that I have not even tried to talk about, with those folk who think themselves ready to challenge the problem of true AI. For they seem to have the standard reflex of running away from difficult problems, and are challenging AI only because they think their amazing insight has already solved it. Just mentioning the problem of Friendly AI by itself, or of precision-grade AI design, is enough to send them fleeing into the night, screaming "It's too hard! It can't be done!" If I tried to explain that their job duties might impinge upon the sacred, mysterious, holy Problem of Subjective Experience—
—I'd actually expect to get blank stares, mostly, followed by some instantaneous dismissal which requires no further effort on their part. I'm not sure of what the exact dismissal would be—maybe, "Oh, none of the hypotheses my AI considers, could possibly be a person?" I don't know; I haven't bothered trying. But it has to be a dismissal which rules out all possibility of their having to actually solve the damn problem, because most of them would think that they are smart enough to build an AI—indeed, smart enough to have already solved the key part of the problem—but not smart enough to solve the Mystery of Consciousness, which still looks scary to them.
Even if they thought of trying to solve it, they would be afraid of admitting they were trying to solve it. Most of these people cling to the shreds of their modesty, trying at one and the same time to have solved the AI problem while still being humble ordinary blokes. (There's a grain of truth to that, but at the same time: who the hell do they think they're kidding?) They know without words that their audience sees the Mystery of Consciousness as a sacred untouchable problem, reserved for some future superbeing. They don't want people to think that they're claiming an Einsteinian aura of destiny by trying to solve the problem. So it is easier to dismiss the problem, and not believe a proposition that would be uncomfortable to explain.
Build an AI? Sure! Make it Friendly? Now that you point it out, sure! But trying to come up with a "nonperson predicate"? That's just way above the difficulty level they signed up to handle.
But a blank map does not correspond to a blank territory. Impossible confusing questions correspond to places where your own thoughts are tangled, not to places where the environment itself contains magic. Even difficult problems do not require an aura of destiny to solve. And the first step to solving one is not running away from the problem like a frightened rabbit, but instead sticking long enough to learn something.
So let us not run away from this problem. I doubt it is even difficult in any absolute sense, just a place where my brain is tangled. I suspect, based on some prior experience with similar challenges, that you can't really be good enough to build a Friendly AI, and still be tangled up in your own brain like that. So it is not necessarily any new effort—over and above that required generally to build a mind while knowing exactly what you are about.
But in any case, I am not screaming and running away from the problem. And I hope that you, dear longtime reader, will not faint at the audacity of my trying to solve it.
More precisely, the AI will be banned from actually running simulations based on the "forbidden hypothesies" rather than perhaps considering abstract mathematical properties that don't simulate in any detail.
Of course, those considerations themselves would have to be fed through the predicate. But it isn't so much a "banned hypothesis" so much as "banned methods of considering the hypothesis" or possibly "banned methods of searching the hypothesis space"