One approach to treating an AI ethically is to design it to not be a person. Of course, this means building it the hard way, but, as Tetronian notes, that is already a requirement of making it Friendly.
What are the boundaries of not being a person?
I'm inclined to think that any computer complex enough to be useful will at least have to have a model of itself and a model of what changes to the self (or possibly to the model of itself, which gets to be an interesting distinction) are acceptable. This is at least something like being a person, though presumably it wouldn't need to be able to experience pain.
I'm not going to exclude the possibility of something like pain, though, either -- it might be the most efficient way of modeling "don't do that".
Huh-- this makes p-zombies interesting. Could an AI need qualia?
Eliezer has anticipated your argument:
"Um - okay, look, putting aside the obvious objection that any sufficiently powerful intelligence will be able to model itself -"
Lob's Sentence contains an exact recipe for a copy of itself, including the recipe for the recipe; it has a perfect self-model. Does that make it sentient?
A guide might be how humans have done it up to now. Historically, humans have tended to be reluctant to grant full privileges of humanity even to other humans where they could possibly gain advantage for themselves or their group, until the other humans in question have actually figured out how to shoot back. This may itself be a convincing practical reason to treat AIs nicely.
I'm not sure that an AI would necessarily realize that punching back is the obvious answer. However, I do agree that if you are using evolution or some similar process, then you run the risk of eventually creating one that will. Hence my argument below that this is a bad idea.
I agree with you, but I think your argument is moot because I don't see evolution as a practical way to develop AIs, and especially not Friendly ones. Indeed, if Eliezer and SIAI are correct about the possibility of FOOM, then using evolution to create AIs would be extremely dangerous.
I think if you want "proven friendly" AIs, they would almost have to be evolved because of Rice's Theorem. Compare it to creating a breed of dog that isn't aggressive. I think FOOM fails for the same reason--see the last bit of "Survival Strategies" .
As you say, it may not be practical to do so, perhaps because of technological limitations. But imagine a set "personality engine" with a bunch of parameters that affect machine-emotional responses to different stimuli. Genetic programming would be a natural approach to find a good mix of those parameter values for different applications.
How is Rice's theorem at all relevant here?
Note: Just because there is no general algorithm to tell whether an arbitrary AI is friendly, doesn't mean it's impossible to construct a friendly AI.
Imagine that you want to construct an AI that will never self-halt (easier to define than friendliness, but the same idea applies). You could build the machine so that it doesn't have an off switch, and therefore can't halt simply out of inability. However, if the machine can self-modify, it could subsequently grant itself the ability to halt. So in your design, you'd have to figure out a way to prevent self-halting under all possible input conditions, under all possible self-modifications of the machine. This latter task cannot be solved in the general case because of Rice's Theorem, and engineering a solution leads to an infinite regress:
So in practice, how can one create a relatively non-suicidal AI? An evolutionary/ecological approach is proven to work: witness biological life. (however, humans, who have the most general computational power, suicide at a more or less constant rate).
In short: genetic programming, or some other such search, can possibly find quasi-solutions (meaning they work under conditions that have been tested) if they exist, but designing in all the required characteristics up front would require tremendous ability to prove outcomes for each specific case. In practice, this debate is probably moot because it'll be a combination of both.
So in your design, you'd have to figure out a way to prevent self-halting under all possible input conditions, under all possible self-modifications of the machine.
Self-modifications are being performed by the machine itself. Thus we (and/or machine) don't need to prove that all modifications aren't "suicidal". Machine can be programmed to perform only provably (in reasonable time) non-suicidal self-modifications. Rice's theorem doesn't apply in this case.
Edit: However this leaves meta-level unpatched. Machine can self-modify into non-suicidal machine that doesn't care about preserving non-suicidability over modifications. This can be patched by constraining allowed self-modifications to a class of modifications that leads to machines with provably equivalent behavior (with a possible side effect of inability to self-repair).
The kind of constraint you propose would be very useful. We would have to first prove that there is a kind of topology in under general computation (because the machine can change its own language, so the solution can't be language specific) that only allows non-suicidal trajectories under all possible inputs and self-modifications. (or perhaps at least with low probability, but this is not likely to be computable). I have looked, but not found such a thing in existing theory. There is work on topology of computation, but it's something different from this. I may just be unaware of it, however.
Note that in the real-world scenario, we also have to worry about entropy battering around the design, so we need a margin of error for that too.
Finally, the finite-time solution is practical, but ultimately not satisfying. The short term solution to being in a building on fire may be to stay put. The long term solution may be to risk short-term harm for long-term survival. And so with only short-term solutions, one may end up in a dead end down the road. A practical limit on short-term advance simulation is that one still has to act in real time while the simulation runs. And if you want the simulation to take into account that simulations are occurring, we're back to infinite regress...
You haven't read the sequences, have you? The idea of using evolution to produce safe-enough superintelligences was destroyed quite neatly there, say, here: http://lesswrong.com/lw/td/magical_categories/
Also, when we're talking about artificial intelligences, the time period between the point "They're intelligent enough to have some sort of ethical value" and the point "They're intelligent enough to totally dominate us" is most likely really, really short, I'd say less than 10 years, some could say less than 10 days.
No, didn't read the sequences. I will do that. The link might be better named to something that indicates what it actually is. But I didn't say the AIs would be safe (or super-intelligent, for that matter), and I don't assume they would be. But those who create them may assume that.
But I didn't say the AIs would be safe (or super-intelligent, for that matter)
This sort of disclaimer can protect in you in a discussion on the level of armchair philosophy, whose sole purpose is to show off how smart you are, but if you were to actually build an AI, and it went FOOM and tiled the universe with molecular smiley faces, taking all humans apart in the process, the fact that you didn't claim the AI would be safe would not compel the universe to say "that's all right, then" and hit a magic reset button to give you another chance. Which is why we ask the question "Is this AI safe?" and tend to not like ideas that result in a negative answer, even if the idea didn't claim to address that concern.
In the novel Life Artificial I use the following assumptions regarding the creation and employment of AI personalities.
(Note: PDA := AI, Sticky := human)
The second fitness gradient is based on economics and social considerations: can an AI actually earn a living? Otherwise it gets turned off.
As a result of following this line of thinking, it seems obvious that after the initial novelty wears off, AIs will be terribly mistreated (anthropomorphizing, yeah).
It would be very forward-thinking to begin to engineer barriers to such mistreatment, like a PETA for AIs. It is interesting that such an organization already exists, at least on the Internet: ASPCR