6 A Difficulty in the Concept of CEV

by [anonymous]

27th Mar 2013

2 min read

6

Prerequisite reading: Cognitive Neuroscience, Arrow's Impossibility Theorem, and Coherent Extrapolated Volition.

Abstract: Arrow's impossibility theorem poses a challenge to viability of coherent extrapolated volition (CEV) as a model for safe-AI architecture: per the theorem, no algorithm for aggregating ordinal preferences can necessarily obey Arrow's four fairness criteria while simultaneously producing a transitive preference ordering. One approach to exempt CEV from these consequences is to claim that human preferences are cardinal rather than ordinal, and therefore Arrow's theorem does not apply. This approach is shown to ultimately fail and other options are briefly discussed.

A problem arises when examining CEV from the perspective of welfare economics: according to Arrow's impossibility theorem, no algorithm for the aggregation of preferences can necessarily meet four common-sense fairness criteria while simultaneously producing a transitive result. Luke has previously discussed this challenge. (See the post linked above.)

Arrow's impossibility theorem assumes that human preferences are ordinal but (as Luke pointed out) recent neuroscientific findings suggest that human preferences are cardinally encoded. This fact implies that human preferences - and subsequently CEV - are not bound by the consequences of the theorem.

However, Arrow's impossibility theorem extends to cardinal utilities with the addition of a continuity axiom. This result - termed Samuelson's conjecture - was proven by Ehud Kalai and David Schmeidler in their 1977 paper "Aggregation Procedure for Cardinal Preferences." If an AI attempts to model human preferences using a utility theory that relies on the continuity axiom, then the consequences of Arrow's theorem will still apply. For example, this includes an AI using the von Neumann-Morgenstern utility theorem.

The proof of Samuelson's conjecture limits the solution space for what kind of CEV aggregation procedures are viable. In order to escape the consequences of Arrow's impossibility theorem, a CEV algorithm must accurately model human preferences without using a continuity axiom. It may be the case that we are living in a second-best world where such models are impossible. This scenario would mean we must make a trade-off between employing a fair aggregation procedure and producing a transitive result.

Supposing this is the case, what kind of trade-offs would be optimal? I am hesitant about weakening the transitivity criterion because an agent with a non-transitive utility function is vulnerable to Dutch-book theorems. This scenario poses a clear existential risk. On the other hand, weakening the independence of irrelevant alternatives criterion may be feasible. My cursory reading of the literature suggests that this is a popular alternative among welfare economists, but there are other choices.

Going forward, citing Arrow's impossibility theorem may serve as one of the strongest objections against CEV. Further consideration on how to reconcile CEV with Arrow's impossibility theorem is warranted.

Personal Blog

6

New Comment

Rendering 0/23 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 3:43 AM

Moderation Log

Curated and popular this week

23Comments

A Difficulty in the Concept of CEV — LessWrong

6 A Difficulty in the Concept of CEV

by [anonymous]

27th Mar 2013

2 min read

6

Prerequisite reading: Cognitive Neuroscience, Arrow's Impossibility Theorem, and Coherent Extrapolated Volition.

Personal Blog

6

New Comment

Rendering 0/23 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 3:43 AM

Moderation Log

Curated and popular this week

23Comments

Comment Permalink

Nisan13y80

According to the Kalai and Schmeidler paper, the problem with this is that you're only allowed to know their utility functions up to translation and scaling. In order to aggregate people's preferences deterministically, you'd have to decide beforehand on a way of assigning a utility function based on revealed preferences (such as normalizing). But, according to Kalai and Schmeidler, this is impossible unless your scheme is discontinuous or if your scheme gives a different answer when you restrict to a subset of possible outcomes. (E.g., if you normalize so that an agent's least-favorite outcome has decision-theoretic utility 0, then the normalization will be different if you decide to ignore some outcomes.) You probably don't care because:

You don't care about aggregating preferences in a determinstic way; or
You don't care about your aggregation being continuous; or
You don't care if your aggregation gives different answers when you restrict to a subset of outcomes. ("Cardinal independence of irrelevant alternatives" in the paper.)

EDIT: Qiaochu said it first.

Eliezer Yudkowsky13y70

Stuart Armstrong has proved some theorems showing that it's really really hard to get to the Pareto frontier unless you're adding utility functions in some sense, with the big issue being the choice of scaling factor. I'm not sure even so, on a moral level - in terms of what I actually want - that I quite buy Armstrong's theorems taken at face value, but on the other hand it's hard to see how, if you had a solution that wasn't on the Pareto frontier, agents would object to moving to the Pareto frontier so long as they didn't get shafted somehow.

It occurre... (read more)

6Qiaochu_Yuan13y

Can we say "being continuous with respect to the particular topology Kalai and Schmeidler chose, which is not obviously the correct topology to choose"? I would have chosen something like the quotient topology. The topology Kalai and Schmeidler chose is based on normalizations and, among other things, isolates the indifferent utility function (the one assigning the same value to all outcomes) from everything else.

See in context