At
http://lesswrong.com/lw/ru/the_bedrock_of_fairness/ldy
Eliezer mentions two challenges he often gets, "Friendly to who?" and "Oh, so you get to say what 'Friendly' means." At the moment I see only one true answer to these questions, which I give below. If you can propose alternatives in the comments, please do.
I suspect morality is in practice a multiplayer game, so talking about it needs multiple people to be involved. Therefore, let's imagine a dialogue between A and B.
A: Okay, so you're interested in Friendly AI. Who will it be Friendly toward?
B: Obviously the people who participate in making the system will decide how to program it, so they will decide who it is Friendly toward.
A: So the people who make the system decide what "Friendly" means?
B: Yes.
A: Then they could decide that it will be Friendly only toward them, or toward White people. Aren't that sort of selfishness or racism immoral?
B: I can try to answer questions about the world, so if you can define morality so I can do experiments to discover what is moral and what is immoral, I can try to guess the results of those experiments and report them. What do you mean by morality?
A: I don't know. If it doesn't mean anything, why do people talk about morality so much?
B: People often profess beliefs to label themselves as members of a group. So far as I can tell, the belief that some things are moral and other things are not is one of those beliefs. I don't have any other explanation for why people talk so much about something that isn't subject to experimentation.
A: So if that's what morality is, then it's fundamentally meaningless unless I'm planning out what lies to tell in order to get positive regard from a potential ingroup, or better yet I manage to somehow deceive myself so I can truthfully conform to the consensus morality of my desired ingroup. If that's all it is, there's no constraint on how a Friendly AI works, right? Maybe you'll build it and it will be only be Friendly toward B.
B: No, because I can't do it by myself. Suppose I approach you and say "I'm going to make a Friendly AI that lets me control it and doesn't care about anyone else's preference." Would you help me?
A: Obviously not.
B: Nobody else would either, so the only way I can unilaterally run the world with an FAI is to create it by myself, and I'm not up to that. There are a few other proposed notions of Friendlyness that are nonviable for similar reasons. For example, if I approached you and said "I'm going to make a Friendly AI that treats everyone fairly, but I don't want to let anybody inspect how it works." Would you help me?
A: No, because I wouldn't trust you. I'd assume that you plan to really make it Friendly only toward yourself, lie about it, and then drop the lie once the FAI had enough power that you didn't need the lie any more.
B: Right. Here's an ethical system that fails another way: "I'll make an FAI that cares about every human equally, no matter what they do." To keep it simple, let's assume that engineering humans to have strange desires for the purpose of manipulating the FAI is not possible. Would you help me build that?
A: Well, it fits with my intuitive notion of morality, but it's not clear what incentive I have to help. If you succeed, I seem to win equally at the end whether I help you or not. Why bother?
B: Right. There are several possible fixes for that. Perhaps if I don't get your help, I won't succeed, and the alternative is that someone else builds it poorly and your quality of life decreases dramatically. That gives you an incentive to help.
A: Not much of one. You'll surely need a lot of help, and maybe if all those other people help I won't have to. Everyone would make the same decision and nobody would help.
B: Right. I could solve that problem by paying helpers like you money, if I had enough money. Another option would be to tilt the Friendlyness in the direction of helpers in proportion to how much they help me.
A: But isn't tilting the Friendlyness unfair?
B: Depends. Do you want things to be fair?
A: Yes, for some intuitive notion of "fairness" I can't easily describe.
B: So if the AI cares what you want, that will cause it to figure out what you mean by "fair" and tend to make it happen, with that tendency increasing as it tilts more in your favor, right?
A: I suppose so. No matter what I want, if the AI cares enough about me, it will give me more of what I want, including fairness.
B: Yes, that's the best idea I have right now. Here's another alternative: What would happen if we only took action when there's a consensus about how to weight the fairness?
A: Well, 4% of the population are sociopaths. They, and perhaps others, would make ridiculous demands and prevent any consensus. Then we'd be waiting forever to build this thing and someone else who doesn't care about consensus would move while we're dithering and make us irrelevant. Thus we'll have to take action and do something reasonable without having a consensus about what that is. Since we can't wait for a consensus, maybe it makes sense to proceed now. So how about it? Do you need help yet?
B: Nope, I don't know how to make it.
A: Damn. Hmm, do you think you'll figure it out before everybody else?
B: Probably not. There are a lot of everybody else. In particular, business organizations that optimize for profit have a lot of power and have fundamentally inhuman value systems. I don't see how I can take action before all of them.
A: Me either. We are so screwed.
Yes, that 'poetry' explains what extrapolation is, but not why we need to risk it. To my mind, this is the most dangerous aspect of the whole FAI enterprise. Yet we don't have anything approaching an analysis of a requirements document - instead we get a poetic description of what Eliezer wants, a clarification of what the poetry means, but no explanation of why we should want that. It is presumed to be obvious that extrapolating can only improve things. Well, lets look more closely.
An AI is going to tell us what we would want, if only we knew more. Apparently, there is an assumption here that the AI knows things we don't. Personally, I worry a bit that an AI will come to believe things that are not true. In fact, I worry about it most when the AI claims to know something that mankind does not know - something dealing with human values. Why do I worry about that? Something someone wrote somewhere presumably. But maybe that is not the kind of superior AI 'knowledge' that Eliezer is talking about here.
And instead of extrapolating, why not just inform Fred where the diamond is? At this point, the explanation becomes bizarre.
Am I alone in preferring, in this situation, that the AI not diagnose a 'muddle', and instead give Fred box A after offering him the relevant knowledge?
Again, if the faster thinking allows the AI to serve as an oracle, making suggestions that even our limited minds can appreciate once we hear them, then why should we take the risk of promoting the AI from oracle to king? The AI should tell us things rather than speaking for us.
When we have a contradiction between a moral intuition and a maxim codifying our system of moral standards there are two ways we can go - we can revise the intuition or we can revise the maxim. It makes me nervous having an AI make the decisions leading to 'reflective equilibrium' rather than making those decisions myself. Instead of an extrapolation, I would prefer a dialog leading me to my own choice of equilibrium rather than having a machine pick one for me. Again, my slogan is "Speak to us, don't speak for us."
I'm not sure what to make of this one. Is there a claim here that extrapolation automatically leads to coherence? If so, could we have an argument justifying that claim? Or, is the point that the extrapolation specification has enough 'free play' to allow the AI to guide the extrapolation to coherence? Coherence is certainly an important issue. A desideratum? Certainly. A requirement? Maybe. But there are other ways of achieving accommodation without trying to create an unnatural coherence in our diverse species.
These are topics that really need to be discussed in a format other than poetry.
As I understand CEV, the hope is that it will, and if it doesn't, CEV is said to fail. Humanity may not have a CEV.