(This page is about extrapolated volition as a normative moral theory - that is, the theory that extrapolated volition captures the concept of value or what outcomes we should want. For the closely related proposal about what a sufficiently advanced self-directed AGI should be built to want/target/decide/do, see coherent extrapolated volition.)
Extrapolated volition is the notion that when we ask "What is right?", then insofar as we're asking something meaningful, we're asking about the result of running a certain logical function over possible states of the world, where this function is obtained by extrapolating our current decision-making process in directions such as "What if I knew more?", "What if I had time to consider more arguments (so long as the arguments weren't hacking my brain)?", or "What if I understood myself better and had more self-control?"
A very simple example of extrapolated volition might be to consider somebody who asks you to bring them orange juice from the refrigerator. You open the refrigerator and see no orange juice, but there's lemonade. You imagine that your friend would want you to bring them lemonade if they knew everything you knew about the refrigerator, so you bring them lemonade instead. On an abstract level, we can say that you "extrapolated" your friend's "volition", in other words, you took your model of their mind and decision process, or your model of their "volition", and you imagined a counterfactual version of their mind that had better information about the contents of your refrigerator, thereby "extrapolating" this volition.
Having better information isn't the only way that a decision process can be extrapolated; we can also, for example, imagine that a mind has more time in which to consider moral arguments, or better knowledge of itself. Maybe you currently want revenge on the Capulet family, but if somebody had a chance to sit down with you and have a long talk about how revenge affects civilizations in the long run, you could be talked out of that. Maybe you're currently convinced that you advocate for green shoes to be outlawed out of the goodness of your heart, but if you could actually see a printout of all of your own emotions at work, you'd see there was a lot of bitterness directed at people who wear green shoes, and this would change your mind about your decision.
In Yudkowsky's version of extrapolated volition considered on an individual level, the three core directions of extrapolation are:
Different people initially react to the question "Where should we point a superintelligence?" differently and intuitively approach it from different angles (and we'll eventually need an Arbital dispatching questionnaire on a page that handles it). These angles include:
Standard starting replies:
Arguendo by CEV's advocates, these conversations eventually all end up converging on Coherent Extrapolated Volition as an alignment proposal by different roads. "Extrapolated volition" is the corresponding normative theory that you arrive at along conversational roads that tour through questioning the meaning of 'right' or trying to figure out what we should really truly do.
(Work in progress. This page is a stub and doesn't try to write out the actual dialogues yet.)
Metaethics is the field of academic philosophy that deals with the question, not of "What is good?", but "What sort of property is goodness?" Rather than arguing over what is good, we are, from the perspective of AI-grade philosophy, asking how to compute what is good, and why the output of that computation ought to be identified with the notion of shouldness.
In metaethical terms, EV would say that for each person at a single moment in time, right or should is to be identified with a (subjectively uncertain) logical constant that is fixed for that person at that particular moment in time, where this logical constant is to be identified with the result of running the extrapolation process on that person. We can't run the extrapolation process so we can't get perfect knowledge of this logical constant, and will be subjectively uncertain about what is right.
To eliminate one important ambiguity in how this might cash out, we regard this logical constant as being analytically identified with the extrapolation of our brains, but not counterfactually dependent on counterfactually varying forms of our brains. If you imagine being administered a pill that makes you want to kill people, then you shouldn't compute in your imagination that different things are right for this new self. Instead, this new self now wants to do something other than what is right. We can meaningfully say, "Even if I (a counterfactual version of me) wanted to kill people, that wouldn't make it right" because the counterfactual alteration of the self doesn't change the logical object that you mean by saying 'right'.
However, there's still an analytic relation between this logical object and your actual mindstate, baked into the very meaning of discourse about shouldness, which means that you can get veridical information about this logical object by having a sufficiently intelligent AI run an approximation of the extrapolation process over a good model of your actual mind. If a sufficiently intelligent and trustworthy AGI tells you that after thinking about it for a while you wouldn't want to eat cows, you have gained veridical information about whether it's right to eat cows.
Within the standard terminology of academic metaethics, "extrapolated volition" as a normative theory is:
Closest antecedents in academic metaethics are Rawls and Goodman's reflective equilibrium, Harsanyi and Railton's ideal advisor theories, and Frank Jackson's moral functionalism.
Argument. If extrapolated volition is analytically equivalent to good, then the question "Is it true that extrapolated volition is good?" is meaningless or trivial. However, this question is not meaningless or trivial, and seems to have an open quality about it. Therefore, extrapolated volition is not analytically equivalent to goodness.
Reply. Extrapolated volition is not supposed to be transparently identical to goodness. The normative identity between extrapolated volition and goodness is allowed to be something that you would have to think for a while and consider many arguments to perceive. Natively, human beings don't start out with any kind of explicit commitment to a particular metaethics; our brains just compute a feeling of rightness about certain acts, and then sometimes update and say that acts we previously thought were right are not-right. When we go from that, to trying to draw a corresponding logical function that we can see our brains as approximating, and updating when we learn new things or consider new arguments, we are carrying out a project of "rescuing the utility function" by reasoning that this is the best particular abstract thing we could be said to be reasoning about, rather than, say, an irreducible non-natural rightness property baked into the universe. It's not very surprising if this bit of philosophy takes longer than five minutes to reason through.