(This page is about extrapolated volition as a normative moral theory - that is, the theory that extrapolated volition captures the concept of value or what outcomes we should want. For the closely related proposal about what a sufficiently advanced self-directed AGI should be built to want/target/decide/do, see coherent extrapolated volition.)

Basic concept

Extrapolated volition is the notion that when we ask "What is right?", then insofar as we're asking something meaningful, we're asking about the result of running a certain logical function over possible states of the world, where this function is obtained by extrapolating our current decision-making process in directions such as "What if I knew more?", "What if I had time to consider more arguments (so long as the arguments weren't hacking my brain)?", or "What if I understood myself better and had more self-control?"

A very simple example of extrapolated volition might be to consider somebody who asks you to bring them orange juice from the refrigerator. You open the refrigerator and see no orange juice, but there's lemonade. You imagine that your friend would want you to bring them lemonade if they knew everything you knew about the refrigerator, so you bring them lemonade instead. On an abstract level, we can say that you "extrapolated" your friend's "volition", in other words, you took your model of their mind and decision process, or your model of their "volition", and you imagined a counterfactual version of their mind that had better information about the contents of your refrigerator, thereby "extrapolating" this volition.

Having better information isn't the only way that a decision process can be extrapolated; we can also, for example, imagine that a mind has more time in which to consider moral arguments, or better knowledge of itself. Maybe you currently want revenge on the Capulet family, but if somebody had a chance to sit down with you and have a long talk about how revenge affects civilizations in the long run, you could be talked out of that. Maybe you're currently convinced that you advocate for green shoes to be outlawed out of the goodness of your heart, but if you could actually see a printout of all of your own emotions at work, you'd see there was a lot of bitterness directed at people who wear green shoes, and this would change your mind about your decision.

In Yudkowsky's version of extrapolated volition considered on an individual level, the three core directions of extrapolation are:

Increased knowledge - having more veridical knowledge of declarative facts and expected outcomes.

Increased consideration of arguments - being able to consider more possible arguments and assess their validity.

Increased reflectivity - greater knowledge about the self, and to some degree, greater self-control (though this raises further questions about which parts of the self normatively get to control which other parts).

Basic motivation

Different people initially react to the question "Where should we point a superintelligence?" differently and intuitively approach it from different angles (and we'll eventually need an Arbital dispatching questionnaire on a page that handles it). These angles include:

All this talk of 'shouldness' is just a cover for the fact that whoever gets to build the superintelligence wins all the marbles; no matter what you do with your superintelligence, you'll be the one who does it.

What if we tell the superintelligence what to do and it's the wrong thing? What if we're basically confused about what's right? Shouldn't we let the superintelligence figure that out on its own with its own superior intelligence?

Imagine the Ancient Greeks telling a superintelligence what to do. They'd have told it to optimize personal virtues, including, say, a glorious death in battle. This seems like a bad thing and we need to figure out how not to do the analogous thing. So telling an AGI to do what seems like a good idea to us will also end up seeming a very regrettable decision a million years later.

Obviously we should just tell the AGI to optimize liberal democratic values. Liberal democratic values are good. The real threat is if bad people get their hands on AGI and build an AGI that doesn't optimize liberal democratic values.

Standard starting replies:

Okay, but suppose you're a programmer and you're trying not to be a jerk. If you're like, "Well, whatever I do originates in myself and is therefore equally selfish, so I might as well declare myself God-Emperor of Earth," you're being a jerk. Is there anything we can do which is less jerky, and indeed, minimally jerky?

If you say you have no information at all about what's 'right', then what does the term even mean? If I might as well have my AGI maximize paperclips and you have no ground on which to stand and say that's the wrong way to compute normativity, then what are we even talking about in the first place? The word 'right' or 'should' must have some meaning that you know about, even if it doesn't automatically print out a list of everything you know is right. Let's talk about hunting down that meaning.

Okay, so what should the Ancient Greeks have done if they did have to program an AI? How could they not have doomed future generations? Suppose the Ancient Greeks are clever enough to have noticed that sometimes people change their minds about things and to realize that they might not be right about everything. How can they use the cleverness of the AGI in a constructively specified, computable fashion that gets them out of this hole? You can't just tell the AGI to compute what's 'right', you need to put an actual computable question in there, not a word.

What if you would, after some further discussion, want to tweak your definition of "liberal democratic values" just a little? What if it's predictable that you would do that? Would you really want to be stuck with your off-the-cuff definition a million years later?

Arguendo by the theory's advocates, these conversations eventually all end up converging on EV and CEV by different roads.

(Work in progress. This page is a stub and doesn't try to write out the actual dialogues yet.)

Situating EV in contemporary metaethics

Metaethics is the field of academic philosophy that deals with the question, not of "What is good?", but "What sort of property is goodness?" Rather than arguing over what is good, we are, from the perspective of AI-grade philosophy, asking how to compute what is good, and why the output of that computation ought to be identified with the notion of shouldness. Theories that try to describe what should be done, rather than what is, are said to be normative rather than descriptive theories.

Within the standard terminology of academic metaethics, "extrapolated volition" as a normative theory is cognitivist (normative propositions can be true or false), naturalist (normative propositions are not irreducible or based on non-natural properties of the world), not internalist (it is not the case that all sufficiently powerful optimizers must act on what we consider to be moral propositions), and reductionist in a way that's more synthetic than analytic (we don't have a priori knowledge of what our own preferences are). Closest antecedents in academic metaethics are Rawls and Goodman's reflective equilibrium, Harsanyi and Railton's ideal advisor theories, and Frank Jackson's moral functionalism / analytic descriptivism.

[comment:

== need to rewrite this at much greater length later, not try to hack it now ==

Responses to standard probes:

Moore's Open Question

"Okay, it turns out that your AI says that eliminating malaria is what I would want to do if I knew everything, thought very fast without letting my brain be hacked, and fully understood myself, and I'm willing to believe your AI computed that correctly, but is it really good? It seems to me like I can imagine that eliminating malaria has this property, but that it still isn't really good, and that the true good is something else entirely."

Reply: "After doing some past updates, hopefully in a validly extrapolative direction, you noticed that it was possible for you to change what you'd previously computed about what to do or what was moral, and since your brain operates on a pretty weird architecture, it does this sort of thing by applying a 'rightness' tag to things. If it makes sense to identify this rightness tag with any referent, that referent is the property of what your brain would update to later. You can imagine the rightness tag just floating around and attaching to whatever,

]

[comment: == need to rewrite this at greater length later, not try to hack it now ==

Detailed concept

Besides the three core directions, for purposes of an self-directed AGI alignment target, Yudkowsky further suggests that computing a collective EV might reasonably:

Try to extrapolate predictable social changes, not otherwise regarded as actively harmful by individual EVs, that depend on human interactions rather than centering on isolated decision processes.

Predict expected neural development in directions that already-neurally-developed-humans would see as normative. In other words, to calculate the extrapolated volition of a human baby, assume it grows up to be a normal adult human, rather than extrapolating the baby's current decision process into a superbaby using only the three core rules.

Relative to the first three or core directions, though, the latter two directions have a relatively less clear normative status.

]

(in progress)

LESSWRONG
Wikitags
LW

Extrapolated volition (normative moral theory)

Basic concept

Basic motivation

Situating EV in contemporary metaethics

Moore's Open Question

Detailed concept