Complexity of Value ≠ Complexity of Outcome

Wei Dai

65 Complexity of Value ≠ Complexity of Outcome

30th Jan 2010

3 min read

65

Complexity of value is the thesis that our preferences, the things we care about, don't compress down to one simple rule, or a few simple rules. To review why it's important (by quoting from the wiki):

Caricatures of rationalists often have them moved by artificially simplified values - for example, only caring about personal pleasure. This becomes a template for arguing against rationality: X is valuable, but rationality says to only care about Y, in which case we could not value X, therefore do not be rational.
Underestimating the complexity of value leads to underestimating the difficulty of Friendly AI; and there are notable cognitive biases and fallacies which lead people to underestimate this complexity.

I certainly agree with both of these points. But I worry that we (at Less Wrong) might have swung a bit too far in the other direction. No, I don't think that we overestimate the complexity of our values, but rather there's a tendency to assume that complexity of value must lead to complexity of outcome, that is, agents who faithfully inherit the full complexity of human values will necessarily create a future that reflects that complexity. I will argue that it is possible for complex values to lead to simple futures, and explain the relevance of this possibility to the project of Friendly AI.

The easiest way to make my argument is to start by considering a hypothetical alien with all of the values of a typical human being, but also an extra one. His fondest desire is to fill the universe with orgasmium, which he considers to have orders of magnitude more utility than realizing any of his other goals. As long as his dominant goal remains infeasible, he's largely indistinguishable from a normal human being. But if he happens to pass his values on to a superintelligent AI, the future of the universe will turn out to be rather simple, despite those values being no less complex than any human's.

The above possibility is easy to reason about, but perhaps does not appear very relevant to our actual situation. I think that it may be, and here's why. All of us have many different values that do not reduce to each other, but most of those values do not appear to scale very well with available resources. In other words, among our manifold desires, there may only be a few that are not easily satiated when we have access to the resources of an entire galaxy or universe. If so, (and assuming we aren't wiped out by an existential risk or fall into a Malthusian scenario) the future of our universe will be shaped largely by those values that do scale. (I should point out that in this case the universe won't necessarily turn out to be mostly simple. Simple values do not necessarily lead to simple outcomes either.)

Now if we were rational agents who had perfect knowledge of our own preferences, then we would already know whether this is the case or not. And if it is, we ought to be able to visualize what the future of the universe will look like, if we had the power to shape it according to our desires. But I find myself uncertain on both questions. Still, I think this possibility is worth investigating further. If it were the case that only a few of our values scale, then we can potentially obtain almost all that we desire by creating a superintelligence with just those values. And perhaps this can be done manually, bypassing an automated preference extraction or extrapolation process with their associated difficulties and dangers. (To head off a potential objection, this does assume that our values interact in an additive way. If there are values that don't scale but interact nonlinearly (multiplicatively, for example) with values that do scale, then those would need to be included as well.)

Whether or not we actually should take this approach would depend on the outcome of such an investigation. Just how much of our desires can feasibly be obtain this way? And how does the loss of value inherent in this approach compare with the expected loss of value due to the potential of errors in the extraction/extrapolation process? These are questions worth trying to answer before committing to any particular path, I think.

P.S., I hesitated a bit in posting this, because underestimating the complexity of human values is arguably a greater danger than overlooking the possibility that I point out here, and this post could conceivably be used by someone to rationalize sticking with their "One Great Moral Principle". But I guess those tempted to do so will tend not to be Less Wrong readers, and seeing how I already got myself sucked into this debate, I might as well clarify and expand on my position.

Complexity of value

Frontpage

65

New Comment

Rendering 0/223 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 5:23 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

65 Complexity of Value ≠ Complexity of Outcome

by Wei Dai

30th Jan 2010

3 min read

223

65

Caricatures of rationalists often have them moved by artificially simplified values - for example, only caring about personal pleasure. This becomes a template for arguing against rationality: X is valuable, but rationality says to only care about Y, in which case we could not value X, therefore do not be rational.
Underestimating the complexity of value leads to underestimating the difficulty of Friendly AI; and there are notable cognitive biases and fallacies which lead people to underestimate this complexity.

Complexity of value

Frontpage

65

Mentioned in

35Ideal Advisor Theories and Personal CEV

23A Thought Experiment on Pain as a Moral Disvalue

15Superintelligence 23: Coherent extrapolated volition

New Comment

Rendering 0/223 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 5:23 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

More from Wei Dai

Curated and popular this week

223Comments

223

Comment Permalink

Eliezer Yudkowsky16y110

If the UFAI convinced you of anything that wasn't true during the process - outright lies about reality or math - or biased sampling of reality producing a biased mental image, like a story that only depicts one possibility where other possibilities are more probable - then we have a simple and direct critique.

If the UFAI never deceived you in the course of telling the story, but simple measures over the space of possible moral arguments you could hear and moralities you subsequently develop, produce a spread of extrapolated volitions "almost all" of whom think that the UFAI-inspired-you has turned into something alien and unvaluable - if it flew through a persuasive keyhole to produce a very noncentral future version of you who is disvalued by central clusters of you - then it's the sort of thing a Coherent Extrapolated Volition would try to stop.

See also #1 on the list of New Humane Rights: "You have the right not to have the spread in your volition optimized away by an external decision process acting on unshared moral premises."

PeerInfinity16y140

New Humane Rights:

You have the right not to have the spread in your volition optimized away by an external decision process acting on unshared moral premises.

You have the right to a system of moral dynamics complicated enough that you can only work it out by discussing it with other people who share most of it.

You have the right to be created by a creator acting under what that creator regards as a high purpose.

You have the right to exist predominantly in regions where you are having fun.

You have the right to be noticeably unique within a local world.

You h... (read more)

12Wei Dai16y

What about the least convenient world where human meta-moral computation doesn't have the coherence that you assume? If you found yourself living in such a world, would you give up and say no meta-ethics is possible, or would you keep looking for one? If it's the latter, and assuming you find it, perhaps it can be used in the "convenient" worlds as well? To put it another way, it doesn't seem right to me that the validity of one's meta-ethics should depend on a contingent fact like that. Although perhaps instead of just complaining about it, I should try to think of some way to remove the dependency... (We also disagree about the likelihood that the coherence assumption holds, but I think we went over that before, so I'm skipping it in the interest of avoiding repetition.)

See in context