The CEV Sequence Summary: The CEV sequence consists of three posts tackling important aspects of CEV. It covers conceptual, practical and computational problems of CEV's current form. On What Selves Are draws on analytic philosophy methods in order to clarify the concept of Self, which is necessary in order to understand whose volition is going to be extrapolated by a machine that implements the CEV procedure. Troubles with CEV part1 and Troubles with CEV part2 on the other hand describe several issues that will be faced by the CEV project if it is actually going to be implemented. Those issues are not of conceptual nature. Many of the objections shown come from scattered discussions found on the web. Finally, some alternatives to CEV are considered.

 

Troubles with CEV Summary: Starting with a summary of CEV, we proceed to show several objections to CEV. First, specific objections to the use of Coherence, Extrapolation, and Volition. Here Part1 ends. Then, in Part2, we continue with objections related to the end product of performing a CEV, and finally, problems relating to the implementation of CEV. We then go on with a praise of CEV, pointing out particular strengths of the idea. We end by showing six alternatives to CEV that have been proposed, and considering their vices and virtues.

Meta: I think Troubles With CEV Part1 and Part2 should be posted to Main. So on the comment section of Part2, I put a place to vote for or against this upgrade.

 

Troubles with CEV Part1

 

Summary of CEV

To begin with, let us remember the most important slices of Coherent Extrapolated Volition (CEV).

“Friendly AI requires:

1.  Solving the technical problems required to maintain a well-specified abstract invariant in a self-modifying goal system. (Interestingly, this problem is relatively straightforward from a theoretical standpoint.)

2.  Choosing something nice to do with the AI. This is about midway in theoretical hairiness between problems 1 and 3.

3.  Designing a framework for an abstract invariant that doesn't automatically wipe out the human species. This is the hard part.

But right now the question is whether the human species can field a non-pathetic force in defense of six billion lives and futures.”
Friendliness is the easiest part of the problem to explain - the part that says what we want. Like explaining why you want to fly to London, versus explaining a Boeing 747; explaining toast, versus explaining a toaster oven. ”

“To construe your volition, I need to define a dynamic for extrapolating your volition, given knowledge about you. In the case of an FAI, this knowledge might include a complete readout of your brain-state, or an approximate model of your mind-state. The FAI takes the knowledge of Fred's brainstate, and other knowledge possessed by the FAI (such as which box contains the diamond), does... something complicated... and out pops a construal of Fred's volition. I shall refer to the "something complicated" as the dynamic.”

This is essentially what CEV is: extrapolating Fred's mind and everyone else's in order to grok what Fred wants. This is performed from a reading of Fred's psychological states, be it through unlikely neurological paths, or through more coarse grained psychological paths. There is reason to think that a complete readout of a brain is overwhelmingly more complicated than a very good descriptive psychological approximation. We must make sure though that this approximation does not rely on our common human psychology to be understood. The descriptive approximation has to be understandable by AGI's, not only by evolutionarily engineered humans. Continuing the summary.

In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.“

Had grown up farther together: A model of humankind's coherent extrapolated volition should not extrapolate the person you'd become if you made your decisions alone in a padded cell. Part of our predictable existence is that we predictably interact with other people. A dynamic for CEV must take a shot at extrapolating human interactions, not just so that the extrapolation is closer to reality, but so that the extrapolation can encapsulate memetic and social forces contributing to niceness.“

“the rule [is] that the Friendly AI should be consistent under reflection (which might involve the Friendly AI replacing itself with something else entirely).”

The narrower the slice of the future that our CEV wants to actively steer humanity into, the more consensus required.“

“The dynamic of extrapolated volition refracts through that cognitive complexity of human minds which lead us to care about all the other things we might want; love, laughter, life, fairness, fun, sociality, self-reliance, morality, naughtiness, and anything else we might treasure. ”

“It may be hard to get CEV right - come up with an AI dynamic such that our volition, as defined, is what we intuitively want. The technical challenge may be too hard; the problems I'm still working out may be impossible or ill-defined.

“The same people who aren't frightened by the prospect of making moral decisions for the whole human species lack the interdisciplinary background to know how much complexity there is in human psychology, and why our shared emotional psychology is an invisible background assumption in human interactions, and why their Ten Commandments only make sense if you're already a human. ”

“Even if our coherent extrapolated volition wants something other than a CEV, the programmers choose the starting point of this renormalization process; they must construct a satisfactory definition of volition to extrapolate an improved or optimal definition of volition. ”

 

Troubles with CEV

1) Stumbling on People, Detecting the Things CEV Will Extrapolate:

Concepts on which CEV relies that may be ill-defined, not having a stable consistent structure in thingspace.

CEV relies on many concepts, most notably the concepts of coherence, extrapolation and volition. We will discuss the problems of coherence and extrapolation shortly, for now I'd like to invoke a deeper layer of conceptual problems regarding the execution of a CEV implementing machine. A CEV executing machine ought to be able to identify the kind of entities whose volitions matter to us, the machine must be able to grasp selfhood, or personhood. The concepts of self and person are mingled and complex, and due to their complexity I have dedicated a separate text to address the issue of incompleteness, anomalousness, and fine-grainedness of selves.

 

2) Troubles with coherence

2a) The Intrapersonal objection: The volitions of the same person when in two different emotional states might be different - it’s as if they are two different people. Is there any good criteria by which a person’s “ultimate” volition may be determined? If not, is it certain that even the volitions of one person’s multiple selves will be convergent? As explained in detail in Ainslie's “Breakdown of Will”, we are made of lots of tinier interacting time-slices whose conflicts cannot be ignored. My chocolate has value 3 now, 5 when it's in my mouth and 0 when I reconsider how quick the pleasure was and how long the fat will stay. Valuations not only interpersonally, but also intrapersonally conflict. The variation in what we value can be correlated with not only with different distances in time, but also different emotional states, priming, background assumptions and other ways in which reality hijacks brains for a period.

 

2b) The Biological Onion objection: Our volitions can be thought of to be like an onion, layers upon layers of beliefs and expectations. The suggestion made by CEV is that when you strip away the layers that do not cohere, you reach deeper regions of the onion. Now, and here is the catch, what if there is no way to get coherence unless you stripe away everything that is truly humane, and end up being left only with that which is biological. What if in service of coherence we end up stripping away everything that matters and end up only with our biological drives? There is little in common between Eliezer, Me and Al Qaeda terrorists, and most of it is in the so called reptilian brain. We may end up with a set of goals and desires that are nothing more than “Eat Survive Reproduce,” which would qualify as a major loss in the scheme of things. In this specific case, what ends up dominating CEV is what evolution wants, not what we want. Instead of creating a dynamic with a chance of creating the landscape of a Nice Place to Live, we end up with some exotic extrapolation of simple evolutionary drives. Let us call this failure mode Defeated by Evolution. We are Defeated by Evolution if at any time the destiny of earth becomes nothing more than darwinian evolution all over again, at a different level of complexity or at different speed. So if CEV ends up stripping the biological onion of its goals that matter, extrapolating only a biological core, we are defeated by evolution.

 

3) Troubles with extrapolation

3a) The Small Accretions Objection: Are small accretions of intelligence analogous to small accretions of time in terms of identity? Is extrapolated person X still a reasonable political representative of person X? Are X's values desirably preserved when she is given small accretions of intelligence? Would X allow her extrapolation to vote for her?

This objection is made through an analogy. For countless time philosophers have argued about the immortality of the soul, the existence of the soul, the complexity of the soul and last but not least the identity of the soul with itself over time.

Advancements in the field of philosophy are sparse and usually controversial, and if we were depending on a major advance in understanding of the complexity of our soul we'd be in a bad situation. Luckily, our analogy relies on the issue of personal identity, where it appears as though the issue of personal identity has been treated in sufficient detail by the book Reasons and Persons, Derek Parfit's major contribution to philosophy: Covering cases from fission and fusion to teleportation and identity over time. It is identity over time which concerns us here; Are you the same person as the person you were yesterday? How about one year ago? Or ten years? Derek has helped the philosophical community by reframing the essential question, instead of asking whether X is the same over time, he asks if personal identity is what matters, that is, that which we want to preserve when we deny others the right of shooting us. More recently he develops the question in full detail in his “Is Personal Identity What Matters?”(2007) a long article were all the objections to his original view are countered in minute detail.

We are left with a conception of identity over time not being what matters, and psychological relatedness being the best candidate to take its place. Personal identity is dissolved into a quantitative, not qualitative, question. How much are you the same and the one you were yesterday? Here some percentage enters the field, and once you know how much you are like the person you were yesterday, there is no further question about how much you are the person you were yesterday. We had been asking the wrong question for long, and we risk to be doing the same thing with CEV. What if extrapolation is a process that dissolves that which matters about us and our volitions? What if there is no transitivity of what matters between me and me+1 or me+2 in the intelligence scale? Then abstracting my extrapolation will not preserve what had to be preserved in the first place. To extrapolate our volition in case we knew more, thought faster and had grown up farther together is to accrue small quantities of intelligence during the dynamic, and doing this may be risky. Even if some of our possible extrapolations would end up generating part of a Nice Place to Be, we must be sure none of the other possible extrapolations actually happen. That is, we must make sure CEV doesn't extrapolate in a way that for each step of extrapolation, one slice of what matterness is lost. Just like small accretions of time make you every day less the person you were back in 2010, maybe small accretions of intelligence will be displacing ourselves from what is preserved. Maybe smarter versions of ourselves are not us at all - this is the The Small Accretions Objection.


4) Problems with the concept of Volition

4a) Blue minimizing robots (Yvain post)

4b) Goals vs. Volitions

The machine's actions should be grounded in our preferences, but those preferences are complex and opaque, making our reports unreliable; to truly determine the volitions of people, there must be a previously recognized candidate predictor. We test the predictor in its ability to describe current humans volitions before we give it the task of comprehending extrapolated human volition.

4c) Want to want vs. Would want if thought faster, grew stronger together

Eliezer suggests in CEV that we consider a mistake to give Fred box A if he wanted box A while thinking it contained a diamond, in case we know both that box B contains the diamond and that Fred wants the diamond. Fred's volition, we are told, is to have the diamond, and we must be careful to create machines that extrapolate volition, not mere wanting. This is good, but not enough. There is a sub-area of moral philosophy dedicated to understanding that which we value, and even though it may seem at firsthand that we value our volitions, the process that leads from wanting to having a volition is a different process than the one that leads from wanting to having a value. Values, as David Lewis has argued, are what we want to want. Volitions on the other hand are what we would ultimately want under less stringent conditions. Currently CEV does not consider the iterated wantness aspect of things we value (the want to want aspect). This is problematic in case our volitions do not happen to be constrained by what we value, that is, what we desire to desire. Suppose Fred knows that the diamond he thinks is in box A comes from a bloody conflict region. Fred hates bloodshed and he truly desires not to have desires for diamonds, he wants to be a person that doesn't want diamonds from conflict regions. Yet the flesh is weak and Fred, under the circumstance, really wants the diamond. Both Fred's current volition, and Fred's extrapolated volition would have him choose box B, if only he knew, and in neither case Fred's values have been duly considered. It may be argued that a good enough extrapolation would end up considering his disgust of war, but here we are talking not about a quantitative issue (how much improvement there was) but a qualitative leap (what kind of thing should be preserved). If it is the case, as I argue here, that we ought to preserve what we want to want, this must be done as a separate consideration, not as an addendum, to preserving our volitions, both current and extrapolated.

 

Continues in Part2

New Comment
10 comments, sorted by Click to highlight new comments since:

There is little in common between Eliezer, Me and Al Qaeda terrorists, and most of it is in the so called reptilian brain. We may end up with a set of goals and desires that are nothing more than “Eat Survive Reproduce,” which would qualify as a major loss in the scheme of things.

I think you may possibly be committing the fundamental attribution error. It's my understanding that Al Qaeda terrorists are often people who were in a set of circumstances that made them highly succeptible to propaganda - often illiterate, living in poverty and with few, if any, prospects for advancment. It is easy to manipulate the ignorant and disenfranchised. If they knew more, saw the possibilities and understood more about the world I would be surprised if they would choose a path that diverges so greatly with your own that CEV would have to resort to looking at the reptilian brain.

In this specific case, what ends up dominating CEV is what evolution wants, not what we want. Instead of creating a dynamic with a chance of creating the landscape of a Nice Place to Live, we end up with some exotic extrapolation of simple evolutionary drives.

"What evolution wants" doesn't seem like a clear concept - at least I'm having trouble making concrete sense of it. I think that you're conflating "evolution" with "more ancient drives" - the described extrapolation is an extrapolation with respect to evolutionarily ancient drives.

In particular, you seem to be suggesting that a CEV including only humans will coincide with a CEV including all vertibrates possessing a reptillian brain on the basis that our current goals seem wildly incompatible. However, as I understand it, CEV asks what we would want if we "knew more, grew up further together" etc.

It's my understanding that Al Qaeda terrorists are often people who were in a set of circumstances that made them highly succeptible to propaganda - often illiterate, living in poverty and with few, if any, prospects for advancment. It is easy to manipulate the ignorant and disenfranchised.

No, that is completely wrong: the correlations are quite the opposite way, terrorists tend to be better educated and wealthier. Bin Laden is the most extreme possible example of that - he was a multimillionaire son of a billionaire!

If they knew more, saw the possibilities and understood more about the world I would be surprised if they would choose a >path that diverges so greatly with your own This is not so simple to assert. You have to think of the intensity of their belief in the words of allah. Their fundamental wordview is so different from ours that there may be nothing humane left when we try to combine them.

I think that you're conflating "evolution" with "more ancient drives" In this specific case I was using this figure of speech, yes. I mean't that we would be extrapolating drives that matter for evolution (our ancient drives) but don't really matter to us, not in the sense of Want to Want described in 4c.

This is not so simple to assert. You have to think of the intensity of their belief in the words of allah. Their fundamental wordview is so different from ours that there may be nothing humane left when we try to combine them.

CAVEAT: I'm using CEV as I understand it, not necessarily as it was intended as I'm not sure the notion is sufficiently precise for me to be able to accurately parse all of the intended meaning. Bearing that in mind:

If CEV produces a plan or AI to be implemented, I would expect it to be sufficiently powerful that it would entail changing the worldviews of many people during the course of implementation. My very basic template would be that of Asimov's The Evitable Conflict - the manipulations would be subtle and we would be unlikely to read their exact outcomes at a given time X without implementing them (this would be dangerous, as it means you can't "peak ahead" at the future you cause) though we still prove that at the end we will be left with a net gain in utility. The Asimov example is somewhat less complex, and does not seek to create the best possible future, only a fairly good, stable one, but this basic notion I am borrowing is relevant to CEV.

The drives behind the conviction of the suicide bomber are still composed of human drives, evolutionary artifacts that have met with a certain set of circumstances. The Al Qaeda example is salient today because the ideology is among the most noncontroversial, damaging ideology we can cite. However, I doubt that any currently held ideology or belief system held by any human today is ideal. The CEV should search for ways of redirecting human thought and action - this is necessary for anything that is meant to have global causal control. The CEV does not reconcile current beliefs and ideologies, it seeks to redirect the course of human events to bring about new, more rational, more efficient and healthy ideologies that will be compatible, if this can be done.

If there exists some method for augmenting our current beliefs and ideologies to become more rational, more coherent and more conducive to positive change, then the CEV should find it. Such a method would allow for much more utility than the failure mode you describe, and said failure mode should only occur when such a method is intractable.

In this specific case I was using this figure of speech, yes. I mean't that we would be extrapolating drives that matter for evolution (our ancient drives) but don't really matter to us, not in the sense of Want to Want described in 4c.

My point is that, in general, our drives are a product of evolutionary drives, and are augmented only by context. If the context changes, those drives change as well, but both the old set and new set are comprised of evolutionary drives. CEV changes those higher level drives by controlling the context in sufficiently clever ways.

CEV should probably be able to look at how an individual will develop in different contexts and compute the net utility in each one, and then maximize. The danger here is that we might be directed into a course of events that leads to wireheading.

It occurs to me that the evolutionary failure mode here is probably something like wireheading, though it could be more general. As I see it, CEV is aiming to maximize total utility while minimizing the net negative utility for as many individuals as possible. If some percentage of individuals prove to be impossible to direct towards a good future without causing massive dis-utility in general we have to devise a way to look at each case like this, and ask what sorts of individuals are not getting a pay-off. If it turns out to be a small number of sociopaths, this will probably not be a problem. I expect that we will have the technology to treat sociopaths and bring them into the fold. CEV should consider this possibility as well. If it turns out to be a small number of very likable people, it could be somewhat more complicated, and we should ask why this is happening. I can't think of any reasonable possible scenarios for this at the moment, but I think this is worth thinking about more.

The kernel of this problem is very central to CEV as I understand it, so I think it is good to discuss it in as much detail as we can in order to glean insight.

2a - If volition depends on emotional state, what we want is a me+ who is able to have any of these emotional states, but is not stuck in any one of them. Me+ will grok the states of chocolate-in-hand, chocolate-in-mouth, and fat-on-hips, taking on each emotional set in turn, and then consider the duration as well as the character of each experience. I don't see this as especially problematic, beyond the way that every psychological simulation/prediction is challenging.

3a - Not all psychological changes are problematic for what matters. Parfit has been criticized (unfairly?) on this very point, especially when it comes to changes that are increases in knowledge and rationality. (It may be a misreading of him to infer that all changes count as decreased connectedness over time.) Whenever we try to reason out what it is that we really want, we show a commitment to rationality. We can hardly complain if our criterion of "what we really want" includes increased rationality on the search path.

4c - If "want to want" can't be leveraged into just plain want, in the agent's most rational moments, I suspect it's just hot air. Sometimes "akrasia" isn't, and stated goals are sometimes abandoned on reflection.

In this specific case, what ends up dominating CEV is what evolution wants, not what we want.

Possibly. It also sounds like the best part of Robert Heinlein's Good Outcome for the future. I think we can do better -- but you seem to be arguing for the claim that we can't. Still beats paperclips, or even true orgasmium.

We can do better if we take this kind of problem in consideration. If there is too much of what Eliezer calls spread and muddle, we may end up just evolving faster. I don't think blind faster evolution would be on top of anyone's list of desires.

One of my issues with the interpersonal coherence part: you are splitting off part of someone's wish - the incoherent part - the remainder may be something that was only desired in context of the full wish. For example if people coherently wish for people to have superpowers and have incoherent preferences about dealing with more powerful criminals that result.

Is this a good summary of the ideas presented here or did I miss something important?

1) We need a correct definition for all the (apparently very fuzzy) concepts that CEV relies upon.

2a) People appear to have multiple "selves"; the preferences of each "self" are more consistent than the aggregation of all of them.

2b) If you strip away all the incoherent preferences, you might strip away most of the stuff you really care about.

3) A much smarter version of me does not resemble me any more. That person's preferences are not my preferences.

4a) We are behavior-executors, not a utility-maximizers. The notion of a "preference" or "goal" exists in the map not the territory. Asking "what is someone's true preference" is like asking whether it's a blegg or a rube. etc.

4b) Our reports of our own preferences are unreliable.

4c) CEV doesn't appear to address "Want to want".

You have not considered the failure mode called "Defeated by Evolution"

Other than that, it is a great really short summary. Why don't you do the same to Part2? :)