I'm posting this article on behalf of Brian Tomasik, who authored it but is at present too busy to respond to comments.
Update from Brian: "As of 2013-2014, I have become more sympathetic to at least the spirit of CEV specifically and to the project of compromise among differing value systems more generally. I continue to think that pure CEV is unlikely to be implemented, though democracy and intellectual discussion can help approximate it. I also continues to feel apprehensive about the conclusions that a CEV might reach, but the best should not be the enemy of the good, and cooperation is inherently about not getting everything you want in order to avoid getting nothing at all."
Introduction
I'm often asked questions like the following: If wild-animal suffering, lab universes, sentient simulations, etc. are so bad, why can't we assume that Coherent Extrapolated Volition (CEV) will figure that out and do the right thing for us?
Disclaimer
Most of my knowledge of CEV is based on Yudkowsky's 2004 paper, which he admits is obsolete. I have not yet read most of the more recent literature on the subject.
Reason 1: CEV will (almost certainly) never happen
CEV is like a dream for a certain type of moral philosopher: Finally, the most ideal solution for discovering what we really want upon reflection!
The fact is, the real world is not decided by moral philosophers. It's decided by power politics, economics, and Darwinian selection. Moral philosophers can certainly have an impact through these channels, but they're unlikely to convince the world to rally behind CEV. Can you imagine the US military -- during its AGI development process -- deciding to adopt CEV? No way. It would adopt something that ensures the continued military and political dominance of the US, driven by mainstream American values. Same goes for China or any other country. If AGI is developed by a corporation, the values will reflect those of the corporation or the small group of developers and supervisors who hold the most power over the project. Unless that group is extremely enlightened, CEV is not what we'll get.
Anyway, this is assuming that the developers of AGI can even keep it under control. Most likely AGI will turn into a paperclipper or else evolve into some other kind of Darwinian force over which we lose control.
Objection 1: "Okay. Future military or corporate developers of AGI probably won't do CEV. But why do you think they'd care about wild-animal suffering, etc. either?"
Well, they might not, but if we make the wild-animal movement successful, then in ~50-100 years when AGI does come along, the notion of not spreading wild-animal suffering might be sufficiently mainstream that even military or corporate executives would care about it, at least to some degree.
If post-humanity does achieve astronomical power, it will only be through AGI, so there's high value for influencing the future developers of an AGI. For this reason I believe we should focus our meme-spreading on those targets. However, this doesn't mean they should be our only focus, for two reasons: (1) Future AGI developers will themselves be influenced by their friends, popular media, contemporary philosophical and cultural norms, etc., so if we can change those things, we will diffusely impact future AGI developers too. (2) We need to build our movement, and the lowest-hanging fruit for new supporters are those most interested in the cause (e.g., antispeciesists, environmental-ethics students, transhumanists). We should reach out to them to expand our base of support before going after the big targets.
Objection 2: "Fine. But just as we can advance values like preventing the spread of wild-animal suffering, couldn't we also increase the likelihood of CEV by promoting that idea?"
Sure, we could. The problem is, CEV is not an optimal thing to promote, IMHO. It's sufficiently general that lots of people would want it, so for ourselves, the higher leverage comes from advancing our particular, more idiosyncratic values. Promoting CEV is kind of like promoting democracy or free speech: It's fine to do, but if you have a particular cause that you think is more important than other people realize, it's probably going to be better to promote that specific cause than to jump on the bandwagon and do the same thing everyone else is doing, since the bandwagon's cause may not be what you yourself prefer.
Indeed, for myself, it's possible CEV could be a net bad thing, if it would reduce the likelihood of paperclipping -- a future which might (or might not) contain far less suffering than a future directed by humanity's extrapolated values.
Reason 2: CEV would lead to values we don't like
Some believe that morality is absolute, in which case a CEV's job would be to uncover what that is. This view is mistaken, for the following reasons: (1) Existence of a separate realm of reality where ethical truths reside violates Occam's razor, and (2) even if they did exist, why would we care what they were?
Yudkowsky and the LessWrong community agree that ethics is not absolute, so they have different motivations behind CEV. As far as I can gather, the following are two of them:
Motivation 1: Some believe CEV is genuinely the right thing to do
As Eliezer said in his 2004 paper (p. 29), "Implementing CEV is just my attempt not to be a jerk." Some may believe that CEV is the ideal meta-ethical way to resolve ethical disputes.
I have to differ. First, the set of minds included in CEV is totally arbitrary, and hence, so will be the output. Why include only humans? Why not animals? Why not dead humans? Why not humans that weren't born but might have been? Why not paperclip maximizers? Baby eaters? Pebble sorters? Suffering maximizers? Wherever you draw the line, there you're already inserting your values into the process.
And then once you've picked the set of minds to extrapolate, you still have astronomically many ways to do the extrapolation, each of which could give wildly different outputs. Humans have a thousand random shards of intuition about values that resulted from all kinds of little, arbitrary perturbations during evolution and environmental exposure. If the CEV algorithm happens to make some more salient than others, this will potentially change the outcome, perhaps drastically (butterfly effects).
Now, I would be in favor of a reasonable extrapolation of my own values. But humanity's values are not my values. There are people who want to spread life throughout the universe regardless of suffering, people who want to preserve nature free from human interference, people who want to create lab universes because it would be cool, people who oppose utilitronium and support retaining suffering in the world, people who want to send members of other religions to eternal torture, people who believe sinful children should burn forever in red-hot ovens, and on and on. I do not want these values to be part of the mix.
Maybe (hopefully) some of these beliefs would go away once people learned more about what these wishes really implied, but some would not. Take abortion, for example: Some non-religious people genuinely oppose it, and not for trivial, misinformed reasons. They have thought long and hard about abortion and still find it to be wrong. Others have thought long and hard and still find it to be not wrong. At some point, we have to admit that human intuitions are genuinely in conflict in an irreconcilable way. Some human intuitions are irreconcilably opposed to mine, and I don't want them in the extrapolation process.
Motivation 2: Some argue that even if CEV isn't ideal, it's the best game-theoretic approach because it amounts to cooperating on the prisoner's dilemma
I think the idea is that if you try to promote your specific values above everyone else's, then you're timelessly causing this to be the decision of other groups of people who want to push for their values instead. But if you decided to cooperate with everyone, you would timelessly influence others to do the same.
This seems worth considering, but I'm doubtful that the argument is compelling enough to take too seriously. I can almost guarantee that if I decided to start cooperating by working toward CEV, everyone else working to shape values of the future wouldn't suddenly jump on board and do the same.
Objection 1: "Suppose CEV did happen. Then spreading concern for wild animals and the like might have little value, because the CEV process would realize that you had tried to rig the system ahead of time by making more people care about the cause, and it would attempt to neutralize your efforts."
Well, first of all, CEV is (almost certainly) never going to happen, so I'm not too worried. Second of all, it's not clear to me that such a scheme would actually be put in place. If you're trying to undo pre-CEV influences that led to the distribution of opinions to that point, you're going to have a heck of a lot of undoing to do. Are you going to undo the abundance of Catholics because their religion discouraged birth control and so led to large numbers of supporters? Are you going to undo the over-representation of healthy humans because natural selection unfairly removed all those sickly ones? Are you going to undo the under-representation of dinosaurs because an arbitrary asteroid killed them off before CEV came around?
The fact is that who has power at the time of AGI will probably matter a lot. If we can improve the values of those who will have power in the future, this will in expectation lead to better outcomes -- regardless of whether the CEV fairy tale comes true.
Another thing to worry about with CEV is that the nonperson predicates that whoever writes it decides on will cover things that you consider people, or would not like to see be destroyed at the end of an instrumental simulation.
Humans probably have no built-in intuitions for the details of distinction of things that deserve ethical consideration at the precision required for a nonperson predicate that can flag things as nonpersons that will be useful for instrumental simulations, and yet not flag a fully-detailed simulation of you or me as a nonperson. We don't have detailed enough introspection to know what "sentience" (whatever that means) is at a mechanical level. How can we care about the arrangement of parts that make "sentience," when we don't know what that arrangement is?
I think the process by which some people come to care about animals and others do not is probably highly dependent on which thought experiments they considered in which order, which label they first used for the category in their mind of "things that shouldn't be hurt."
The most memorable occasion when my person predicate changed was when I used to think that people could only exist in a basement universe. Simulations were automatically nonpersons. I thought to myself "if they aren't real I don't care." What changed my mind was the thought "If you ran a simulated version of me, and informed it that it was in a simulation, would it stop simulatedly caring about itself?" (The answer was no). But what if I had read LessWrong first, and become accustomed to thinking of myself as an insane (objectively speaking, not by human standards) biased ape, and said "No, but that's because I'm only human and sometimes have feelings that are contrary to my true ideal utility function. The simulated version may not alieve that he was not real, but he really wouldn't be real, so he Ideal_Mestroyer::should stop caring about himself." That thought isn't factually incorrect. If I had thought it back then, I might still care about "realness" in the same sense. But thinking about it now, it is too late, my terminal values have already changed, perhaps because of a misstep in my reasoning back then, and I am glad they have. But maybe the introduction of "real" (being directly made of physics and not in a simulation) as an important factor was originally based on mistaken reasoning too.
I think most of the features of our nonperson predicates are decided in the same way, partly randomly, based on reasoning mistakes and thought experiments considered first, (more randomly the more philosophical the person is), and partly through absorption from family and peers, which means it doesn't make sense for there to be a coherent extrapolated nonperson predicate for humanity (though you can still superpose a bunch of different ones).
Even if you don't really care about animals, your "person" category (or just "I care about this being" category) might be broader than SIAI's, and if it is, you should be afraid that vast numbers of people will be killed by terminating instrumental simulations.
Even so, if your person predicate is part of the CEV of humanity, perhaps an FAI could self-modify (after running some number of simulations using the old one that wasn't really that big compared to the number of people that would exist in a post-friendly-foom world)
So those people might not be that important to you, compared to what else is at stake. But if your nonperson predicate is not in humanity's CEV, and is uncommon enough that it's not worth it to humanity to accommodate you, and you disvalue death (and not just suffering) CEV might cause you to spend billions of years screaming.
Interesting story. Yes, I think our intuitions about what kinds of computations we want to care about are easily bent and twisted depending on the situation at hand. In analogy with Dennett's "intentional stance," humans have a "compassionate stance" that we apply to some physical operations and don't apply to others. It's not too hard to manipulate these intuitions by thought experiments. So, yes, I do fear that other people may differ (perhaps quite a bit) in their views about what kinds of computations are suffering that we should avoid.