You are viewing version 1.7.0 of this page. Click here to view the latest version.

Extrapolated volition (normative moral theory)

Written by Eliezer Yudkowsky last updated
You are viewing revision 1.7.0, last edited by Eliezer Yudkowsky

(This page is about extrapolated volition as a normative moral theory - that is, the theory that extrapolated volition captures the concept of value or what outcomes we should want. For the closely related proposal about what a sufficiently advanced self-directed AGI should be built to want/target/decide/do, see coherent extrapolated volition.)

Concept

Extrapolated volition is the notion that when we ask "What is right?", then insofar as we're asking something meaningful, we're asking about the result of running a certain logical function over possible states of the world, where this function is obtained by extrapolating our current decision-making process in directions such as "What if I knew more?", "What if I had time to consider more arguments (so long as the arguments weren't hacking my brain)?", or "What if I understood myself better and had more self-control?"

A very simple example of extrapolated volition might be to consider somebody who asks you to bring them orange juice from the refrigerator. You open the refrigerator and see no orange juice, but there's lemonade. You imagine that your friend would want you to bring them lemonade if they knew everything you knew about the refrigerator, so you bring them lemonade instead. On an abstract level, we can say that you "extrapolated" your friend's "volition", in other words, you took your model of their mind and decision process, or your model of their "volition", and you imagined a counterfactual version of their mind that had better information about the contents of your refrigerator, thereby "extrapolating" this volition.

Having better information isn't the only way that a decision process can be extrapolated; we can also, for example, imagine that a mind has more time in which to consider moral arguments, or better knowledge of itself. Maybe you currently want revenge on the Capulet family, but if somebody had a chance to sit down with you and have a long talk about how revenge affects civilizations in the long run, you could be talked out of that. Maybe you're currently convinced that you advocate for green shoes to be outlawed out of the goodness of your heart, but if you could actually see a printout of all of your own emotions at work, you'd see there was a lot of bitterness directed at people who wear green shoes, and this would change your mind about your decision.

In Yudkowsky's version of extrapolated volition considered on an individual level, the three core directions of extrapolation are:

  • Increased knowledge - having more veridical knowledge of declarative facts and expected outcomes.
  • Increased consideration of arguments - being able to consider more possible arguments and assess their validity.
  • Increased reflectivity - greater knowledge about the self, and to some degree, greater self-control (though this raises further questions about which parts of the self normatively get to control which other parts).

Motivation

Different people initially react to the question "Where should we point a superintelligence?" differently and intuitively approach it from different angles (and we'll eventually need an Arbital dispatching questionnaire on a page that handles it). These angles include:

  • All this talk of 'shouldness' is just a cover for the fact that whoever gets to build the superintelligence wins all the marbles; no matter what you do with your superintelligence, you'll be the one who does it.
  • What if we tell the superintelligence what to do and it's the wrong thing? What if we're basically confused about what's right? Shouldn't we let the superintelligence figure that out on its own with its own superior intelligence?
  • Imagine the Ancient Greeks telling a superintelligence what to do. They'd have told it to optimize personal virtues, including, say, a glorious death in battle. This seems like a bad thing and we need to figure out how not to do the analogous thing. So telling an AGI to do what seems like a good idea to us will also end up seeming a very regrettable decision a million years later.
  • Obviously we should just tell the AGI to optimize liberal democratic values. Liberal democratic values are good. The real threat is if bad people get their hands on AGI and build an AGI that doesn't optimize liberal democratic values.

Standard starting replies:

  • Okay, but suppose you're a programmer and you're trying not to be a jerk. If you're like, "Well, whatever I do originates in myself and is therefore equally selfish, so I might as well declare myself God-Emperor of Earth," you're being a jerk. Is there anything we can do which is less jerky, and indeed, minimally jerky?
  • If you say you have no information at all about what's 'right', then what does the term even mean? If I might as well have my AGI maximize paperclips and you have no ground on which to stand and say that's the wrong way to compute normativity, then what are we even talking about in the first place? The word 'right' or 'should' must have some meaning that you know about, even if it doesn't automatically print out a list of everything you know is right. Let's talk about hunting down that meaning.
  • Okay, so what should the Ancient Greeks have done if they did have to program an AI? How could they not have doomed future generations? Suppose the Ancient Greeks are clever enough to have noticed that sometimes people change their minds about things and to realize that they might not be right about everything. How can they use the cleverness of the AGI in a constructively specified, computable fashion that gets them out of this hole? You can't just tell the AGI to compute what's 'right', you need to put an actual computable question in there, not a word.
  • What if you would, after some further discussion, want to tweak your definition of "liberal democratic values" just a little? What if it's predictable that you would do that? Would you really want to be stuck with your off-the-cuff definition a million years later?

Arguendo by CEV's advocates, these conversations eventually all end up converging on Coherent Extrapolated Volition as an alignment proposal by different roads. "Extrapolated volition" is the corresponding normative theory that you arrive at along conversational roads that tour through questioning the meaning of 'right' or trying to figure out what we should really truly do.

(Work in progress. This page is a stub and doesn't try to write out the actual dialogues yet.)

Situating EV in contemporary metaethics

Metaethics is the field of academic philosophy that deals with the question, not of "What is good?", but "What sort of property is goodness?" Rather than arguing over what is good, we are, from the perspective of AI-grade philosophy, asking how to compute what is good, and why the output of that computation ought to be identified with the notion of shouldness.

In metaethical terms, EV would say that for each person at a single moment in time, right or should is to be identified with a (subjectively uncertain) logical constant that is fixed for that person at that particular moment in time, where this logical constant is to be identified with the result of running the extrapolation process on that person. We can't run the extrapolation process so we can't get perfect knowledge of this logical constant, and will be subjectively uncertain about what is right.

To eliminate one important ambiguity in how this might cash out, we regard this logical constant as being analytically identified with the extrapolation of our brains, but not counterfactually dependent on counterfactually varying forms of our brains. If you imagine being administered a pill that makes you want to kill people, then you shouldn't compute in your imagination that different things are right for this new self. Instead, this new self now wants to do something other than what is right. We can meaningfully say, "Even if I (a counterfactual version of me) wanted to kill people, that wouldn't make it right" because the counterfactual alteration of the self doesn't change the logical object that you mean by saying 'right'.

However, there's still an analytic relation between this logical object and your actual mindstate, baked into the very meaning of discourse about shouldness, which means that you can get veridical information about this logical object by having a sufficiently intelligent AI run an approximation of the extrapolation process over a good model of your actual mind. If a sufficiently intelligent and trustworthy AGI tells you that after thinking about it for a while you wouldn't want to eat cows, you have gained veridical information about whether it's right to eat cows.

Within the standard terminology of academic metaethics, "extrapolated volition" as a normative theory is:

  • Cognitivist. Normative propositions can be true or false. You can believe that something is right and be mistaken.
  • Naturalist. Normative propositions are not irreducible or based on non-natural properties of the world.
  • Externalist / not internalist. It is not the case that all sufficiently powerful optimizers must act on what we consider to be moral propositions. A paperclipper does what is clippy, not what is right, and the fact that it's trying to turn everything into paperclips does not indicate a disagreement with you about what is right any more than you disagree about what is clippy.
  • Reductionist. The whole point of this theory is that it's the sort of thing you could potentially compute.
  • More synthetic reductionist than analytic reductionist. We don't have a priori knowledge of our starting mindstate and don't have enough computing power to complete the extrapolation process over it. Therefore, we can't figure out what our extrapolated volition would say just by pondering the meanings of words.

Closest antecedents in academic metaethics are Rawls and Goodman's reflective equilibrium, Harsanyi and Railton's ideal advisor theories, and Frank Jackson's moral functionalism.

EV's replies to standard metaethical questions

Moore's Open Question

Argument. If extrapolated volition is analytically equivalent to good, then the question "Is it true that extrapolated volition is good?" is meaningless or trivial. However, this question is not meaningless or trivial, and seems to have an open quality about it. Therefore, extrapolated volition is not analytically equivalent to goodness.

Reply. Extrapolated volition is not supposed to be transparently identical to goodness. The normative identity between extrapolated volition and goodness is allowed to be something that you would have to think for a while and consider many arguments to perceive. Natively, human beings don't start out with any kind of explicit commitment to a particular metaethics; our brains just compute a feeling of rightness about certain acts, and then sometimes update and say that acts we previously thought were right are not-right. When we go from that, to trying to draw a corresponding logical function that we can see our brains as approximating, and updating when we learn new things or consider new arguments, we are carrying out a project of "rescuing the utility function" by reasoning that this is the best particular abstract thing we could be said to be reasoning about, rather than, say, an irreducible non-natural rightness property baked into the universe. It's not very surprising if this bit of philosophy takes longer than five minutes to reason through.