Suppose we want to use the convergence of humanity's preferences as the utility function of a seed AI that is about to determine the future of its light cone.
We figured out how to get an AI to extract preferences from human behavior and brain activity. The AI figured out how to extrapolate those values. But my values and your values and Sarah Palin's values aren't fully converging in the simulation running the extrapolation algorithm. Our simulated beliefs are converging because on the path to reflective equilibrium our partially simulated selves have become true Bayesians and Aumann's Agreement Theorem holds. But our preferences aren't converging quite so well.
What to do? We'd like the final utility function in the FOOMed AI to adhere to some common-sense criteria:
- Non-dictatorship: No single person's preferences should dictate what the AI does. Its utility function must take multiple people's (extrapolated) preferences into account.
- Determinism: Given the same choices, and the same utility function, the AI should always make the same decisions.
- Pareto efficiency: If every (extrapolated) person prefers action A to action B, the AI should prefer A to B.
- Independence of irrelevant alternatives: If we — a group of extrapolated preference-sets — prefer A to B, and a new option C is introduced, then we should still prefer A to B regardless of what we think about C.
Now, Arrow's impossibility theorem says that we can only get the FOOMed AI's utility function to adhere to these criteria if the extrapolated preferences of each partially simulated agent are related to each other cardinally ("A is 2.3x better than B!") instead of ordinally ("A is better than B, and that's all I can say").
Now, if you're an old-school ordinalist about preferences, you might be worried. Ever since Vilfredo Pareto pointed out that cardinal models of a person's preferences go far beyond our behavioral data and that as far as we can tell utility has "no natural units," some economists have tended to assume that, in our models of human preferences, preference must be represented ordinally and not cardinally.
But if you're keeping up with the latest cognitive neuroscience, you might not be quite so worried. It turns out that preferences are encoded cardinally after all, and they do have a natural unit: action potentials per second. With cardinally encoded preferences, we can develop a utility function that represents our preferences and adheres to the common-sense criteria listed above.
Whaddya know? The last decade of cognitive neuroscience has produced a somewhat interesting result concerning the plausibility of CEV.
This post seems confused, or just confusing.
I don't think there are many people who think that the main problem with aggregating the preferences of different people is ordinal utilities and Arrow's impossibility theorem. Modern economists tend to think about preferences in the von Neumann-Morgenstern tradition, where one's preferences are represented as a utility function from outcomes to real numbers, but any two utility functions that are linear transformations of each other are equivalent (so really each person's preferences are represented by an infinite family of utility functions are that all linear transformations of each other).
How to aggregate preferences of individuals with vNM preferences is still considered an open problem, because there is no obvious "natural" way to combine two such infinite families of utility functions. A given agent might internally represent its preferences using one particular utility function out of the infinite family of equivalent utility functions, but it seems morally indefensible to use that as the basis for aggregating utility.
Why would there be a unique way to aggregate personal utility functions? It's just like a belief in "the" objective morality, only one step removed: instead one now looks for "the" way to aggregate personal moralities.
It's probably naive and psychologically false to imagine that there is an impersonal formula for CEV waiting to be found in the human decision architecture, even when self-idealized. The true nature of human morality is probably something like: self-interest plus sympathy plus plasticity (I don't mean neural plasticity, j... (read more)