At
http://lesswrong.com/lw/ru/the_bedrock_of_fairness/ldy
Eliezer mentions two challenges he often gets, "Friendly to who?" and "Oh, so you get to say what 'Friendly' means." At the moment I see only one true answer to these questions, which I give below. If you can propose alternatives in the comments, please do.
I suspect morality is in practice a multiplayer game, so talking about it needs multiple people to be involved. Therefore, let's imagine a dialogue between A and B.
A: Okay, so you're interested in Friendly AI. Who will it be Friendly toward?
B: Obviously the people who participate in making the system will decide how to program it, so they will decide who it is Friendly toward.
A: So the people who make the system decide what "Friendly" means?
B: Yes.
A: Then they could decide that it will be Friendly only toward them, or toward White people. Aren't that sort of selfishness or racism immoral?
B: I can try to answer questions about the world, so if you can define morality so I can do experiments to discover what is moral and what is immoral, I can try to guess the results of those experiments and report them. What do you mean by morality?
A: I don't know. If it doesn't mean anything, why do people talk about morality so much?
B: People often profess beliefs to label themselves as members of a group. So far as I can tell, the belief that some things are moral and other things are not is one of those beliefs. I don't have any other explanation for why people talk so much about something that isn't subject to experimentation.
A: So if that's what morality is, then it's fundamentally meaningless unless I'm planning out what lies to tell in order to get positive regard from a potential ingroup, or better yet I manage to somehow deceive myself so I can truthfully conform to the consensus morality of my desired ingroup. If that's all it is, there's no constraint on how a Friendly AI works, right? Maybe you'll build it and it will be only be Friendly toward B.
B: No, because I can't do it by myself. Suppose I approach you and say "I'm going to make a Friendly AI that lets me control it and doesn't care about anyone else's preference." Would you help me?
A: Obviously not.
B: Nobody else would either, so the only way I can unilaterally run the world with an FAI is to create it by myself, and I'm not up to that. There are a few other proposed notions of Friendlyness that are nonviable for similar reasons. For example, if I approached you and said "I'm going to make a Friendly AI that treats everyone fairly, but I don't want to let anybody inspect how it works." Would you help me?
A: No, because I wouldn't trust you. I'd assume that you plan to really make it Friendly only toward yourself, lie about it, and then drop the lie once the FAI had enough power that you didn't need the lie any more.
B: Right. Here's an ethical system that fails another way: "I'll make an FAI that cares about every human equally, no matter what they do." To keep it simple, let's assume that engineering humans to have strange desires for the purpose of manipulating the FAI is not possible. Would you help me build that?
A: Well, it fits with my intuitive notion of morality, but it's not clear what incentive I have to help. If you succeed, I seem to win equally at the end whether I help you or not. Why bother?
B: Right. There are several possible fixes for that. Perhaps if I don't get your help, I won't succeed, and the alternative is that someone else builds it poorly and your quality of life decreases dramatically. That gives you an incentive to help.
A: Not much of one. You'll surely need a lot of help, and maybe if all those other people help I won't have to. Everyone would make the same decision and nobody would help.
B: Right. I could solve that problem by paying helpers like you money, if I had enough money. Another option would be to tilt the Friendlyness in the direction of helpers in proportion to how much they help me.
A: But isn't tilting the Friendlyness unfair?
B: Depends. Do you want things to be fair?
A: Yes, for some intuitive notion of "fairness" I can't easily describe.
B: So if the AI cares what you want, that will cause it to figure out what you mean by "fair" and tend to make it happen, with that tendency increasing as it tilts more in your favor, right?
A: I suppose so. No matter what I want, if the AI cares enough about me, it will give me more of what I want, including fairness.
B: Yes, that's the best idea I have right now. Here's another alternative: What would happen if we only took action when there's a consensus about how to weight the fairness?
A: Well, 4% of the population are sociopaths. They, and perhaps others, would make ridiculous demands and prevent any consensus. Then we'd be waiting forever to build this thing and someone else who doesn't care about consensus would move while we're dithering and make us irrelevant. Thus we'll have to take action and do something reasonable without having a consensus about what that is. Since we can't wait for a consensus, maybe it makes sense to proceed now. So how about it? Do you need help yet?
B: Nope, I don't know how to make it.
A: Damn. Hmm, do you think you'll figure it out before everybody else?
B: Probably not. There are a lot of everybody else. In particular, business organizations that optimize for profit have a lot of power and have fundamentally inhuman value systems. I don't see how I can take action before all of them.
A: Me either. We are so screwed.
I agree that you have to renorm everyone's wants for this to work. I also agree that if you can construct broken minds for the purpose of manipulating the FAI, we need provisions to guard against that. My preferred alternative at the moment follows:
Before people become able to construct broken minds, the FAI cares about everything that's genetically human.
After we find the first genetically human mind deliberately broken for the purpose of manipulating the FAI, we guess when the FAI started to be influenced by that, and retroactive to just before that time we introduce a new policy: new individuals start out with a weight of 0, and can receive weight transferred from their parents, so the total weight is conserved. I don't want an economy of weight-transfer to arise, so it would be a one-way irreversible transfer.
This might lead to a few people running around with a weight of 0 because their parents never made the transfer. This would be suboptimal, but it would not have horrible conclusions because the AI would care for the parents who probably care for the new child, so the AI would in effect care some for the new child.
Death of the parents doesn't break this. Caring about the preference of dead people is not a special case.
I encourage people to reply to this post with bugs in this alternative or with other plausible alternatives.