This is a post about moral philosophy, approached with a mathematical metaphor.
Here's an interesting problem in mathematics. Let's say you have a graph, made up of vertices and edges, with weights assigned to the edges. Think of the vertices as US cities and the edges as roads between them; the weight on each road is the length of the road. Now, knowing only this information, can you draw a map of the US on a sheet of paper? In mathematical terms, is there an isometric embedding of this graph in two-dimensional Euclidean space?
When you think about this for a minute, it's clear that this is a problem about reconciling the local and the global. Start with New York and all its neighboring cities. You have a sort of star shape. You can certainly draw this on the plane; in fact, you have many degrees of freedom; you can arbitrarily pick one way to draw it. Now start adding more cities and more roads, and eventually the degrees of freedom diminish. If you made the wrong choices earlier on, you might paint yourself in a corner and have no way to keep all the distances consistent when you add a new city. This is known as a "synchronization problem." Getting it to work locally is easy; getting all the local pieces reconciled with each other is hard.
This is a lovely problem and some acquaintances of mine have written a paper about it. (http://www.math.princeton.edu/~mcucurin/Sensors_ASAP_TOSN_final.pdf) I'll pick out some insights that seem relevant to what follows. First, some obvious approaches don't work very well. It might be thought we want to optimize over all possible embeddings, picking the one that has the lowest error in approximating distances between cities. You come up with a "penalty function" that's some sort of sum of errors, and use standard optimization techniques to minimize it. The trouble is, these approaches tend to work spottily -- in particular, they sometimes pick out local rather than global optima (so that the error can be quite high after all.)
The approach in the paper I linked is different. We break the graph into overlapping smaller subgraphs, so small that they can only be embedded in one way (that's called rigidity) and then "stitch" them together consistently. The "stitching" is done with a very handy trick involving eigenvectors of sparse matrices. But the point I want to emphasize here is that you have to look at the small scale, and let all the little patches embed themselves as they like, before trying to reconcile them globally.
Now, rather daringly, I want to apply this idea to ethics. (This is an expansion of a post people seemed to like: http://lesswrong.com/lw/1xa/human_values_differ_as_much_as_values_can_differ/1y )
The thing is, human values differ enormously. The diversity of values is an empirical fact. The Japanese did not have a word for "thank you" until the Portuguese gave them one; this is a simple example, but it absolutely shocked me, because I thought "thank you" was a universal concept. It's not. (edited for lack of fact-checking.) And we do not all agree on what virtues are, or what the best way to raise children is, or what the best form of government is. There may be no principle that all humans agree on -- dissenters who believe that genocide is a good thing may be pretty awful people, but they undoubtedly exist. Creating the best possible world for humans is a synchronization problem, then -- we have to figure out a way to balance values that inevitably clash. Here, nodes are individuals, each individual is tied to its neighbors, and a choice of embedding is a particular action. The worse the embedding near an individual fits the "true" underlying manifold, the greater the "penalty function" and the more miserable that individual is, because the action goes against what he values.
If we can extend the metaphor further, this is a problem for utilitarianism. Maximizing something globally -- say, happiness -- can be a dead end. It can hit a local maximum -- the maximum for those people who value happiness -- but do nothing for the people whose highest value is loyalty to their family, or truth-seeking, or practicing religion, or freedom, or martial valor. We can't really optimize, because a lot of people's values are other-regarding: we want Aunt Susie to stop smoking, because of the principle of the thing. Or more seriously, we want people in foreign countries to stop performing clitoridectomies, because of the principle of the thing. And Aunt Susie or the foreigners may feel differently. When you have a set of values that extends to the whole world, conflict is inevitable.
The analogue to breaking down the graph is to keep values local. You have a small star-shaped graph of people you know personally and actions you're personally capable of taking. Within that star, you define your own values: what you're ready to cheer for, work for, or die for. You're free to choose those values for yourself -- you don't have to drop them because they're perhaps not optimal for the world's well-being. But beyond that radius, opinions are dangerous: both because you're more ignorant about distant issues, and because you run into this problem of globally reconciling conflicting values. Reconciliation is only possible if everyone's minding their own business. If things are really broken down into rigid components. It's something akin to what Thomas Nagel said against utilitarianism:
"Absolutism is associated with a view of oneself as a small being interacting with others in a large world. The justifications it requires are primarily interpersonal. Utilitarianism is associated with a view of oneself as a benevolent bureaucrat distributing such benefits as one can control to countless other beings, with whom one can have various relations or none. The justifications it requires are primarily administrative." (Mortal Questions, p. 68.)
Anyhow, trying to embed our values on this dark continent of a manifold seems to require breaking things down into little local pieces. I think of that as "cultivating our own gardens," to quote Candide. I don't want to be so confident as to have universal ideologies, but I think I may be quite confident and decisive in the little area that is mine: my personal relationships; my areas of expertise, such as they are; my own home and what I do in it; everything that I know I love and is worth my time and money; and bad things that I will not permit to happen in front of me, so long as I can help it. Local values, not global ones.
Could any AI be "friendly" enough to keep things local?
I would argue that deriving principles using the categorical imperative is a very difficult optimization problem and that there is a very meaningful sense in which one is a deontologist and not a utilitarian. If one is a deontologist then one needs to solve a series of constraint-satisfaction problems with hard constraints (i.e. they cannot be violated). In the Kantian approach: given a situation, one has to derive the constraints under which one must act in that situation via moral thinking then one must accord to those constraints.
This is very closely related to combinatorial optimization problems. I would argue that often there is a "moral dual" (in the sense of a dual program) where those constraints are no longer treated as absolute and you can assign different costs to each violation and you can then find a most moral strategy. I think very often we have something akin to strong duality where the utilitarian dual is equivalent to the deontological problem, but its an important distinction to remember that the deontologist has hard constraints and zero gradient on their objective functions (by some interpretations).
The utilitarian performs a search over a continuous space for the greatest expected utility, while the deontologist (in an extreme case) has a discrete set of choices, from which the immoral ones are successively weeded out.
Both are optimization procedures, and can be shown to produce very similar output behavior but the approach and philosophy are very different. The predictions of the behavior of the deontologist and the utilitarian can become quite different under the sorts of situations that moral philosophers love to come up with.
If all you require is to not violate any constraints, and you have no preference between worlds where equal numbers of constraints are violated, and you can regularly achieve worlds in which no constraints are violated, then perhaps constraint-satisfaction is qualitatively different.
In the real world, linear programming typically involves a combination of hard constraints and penalized constraints. If I say ... (read more)