I suppose I might count as someone who favors "organismal" preferences over confusing the metaphorical "preferences" of our genes with those of the individual. I think your argument against this is pretty weak.
You claim that favoring the "organismal" over the "evolutionary" fails to accurately identify our values in four cases, but I fail to see any problem with these cases.
I dispute this. The SIAI FAI is specifically designed to have control of the universe as one of its goals.
It is widely expected that this will arise as an important instrumental goal; nothing more than that. I can't tell if this is what you mean. (When you point out that "trying to take over the universe isn't utility-maximizing under many circumstances", it sounds like you're thinking of taking over the universe as a separate terminal goal, which would indeed be terrible design; an AI without that terminal goal, that can reason the same way you can, can decide not to try to take over the universe if that looks best.)
I notice that some of my comment wars with other people arise because they automatically assume that whenever we're talking about a superintelligence, there's only one of them. This is in danger of becoming a LW communal assumption. It's not even likely.
I probably missed it in some other comment, but which of these do you not buy: (a) huge first-mover advantages from self-improvement (b) preventing other superintelligences as a convergent subgoal (c) that the conjunction of these implies that a singleton superintelligence is likely?
(More generally, there's a strong tendency for people on LW to attribute very high likelihoods to scenarios that EY spends a lot of time talking about - even if he doesn't insist that they are likely.)
This sounds plausible and bad. Can you think of some other examples?
(More generally, there's a strong tendency for people on LW to attribute very high likelihoods to scenarios that EY spends a lot of time talking about - even if he doesn't insist that they are likely.)
This is probably just availability bias. These scenarios are easy to recall because we've read about them, and we're psychologically primed for them just by coming to this website.
I may be a little slow and missing something, but here are my jumbled thoughts.
I found moral nihilism convincing for a brief time. The argument seems convincing: just about any moral statement you can think of, some people on earth have rejected it. You can't appeal to universal human values... we've tried, and I don't think there's a single one that has stood up to scrutiny as actually being literally universal. You always end up having to say, "Well, those humans are aberrant and evil."
Then I realized that there must be something more complicated going on. Else how explain the fact that I am curious about what is moral? I've changed my mind on moral questions -- pretty damn foundational ones. I've experienced moral ignorance ("I don't know what is right here.") I don't interact with morality as a preference. Or, when I do, sometimes I remember not to, and pull myself back.
I know people who claim to interact with morality as a preference -- only "I want to do this," never "I must do this." I'm skeptical. If you could really have chosen any set of principles ... why did you happen to choose principles that match pretty well with...
I've been reading Bury the Chains, a history of British abolitionism, and the beginning does give the impression of morals as something to be either discovered or invented.
The situation starts with vast majority in Britain not noticing there was anything wrong with slavery. A slave ship captain who later became a prominent abolitionist is working improving his morals-- by giving up swearing.
Once slavery became a public issue, opposition to it grew pretty quickly, but the story was surprising to me because I thought of morals as something fairly obvious.
Fantastic post. Not-so-fantastic title, especially since your real point seems to be more like, "only humans in a human environment can have human values". ISTM that, "Can human values be separated from humans (and still mean anything)?" might be both a more accurate title, and more likely to get a dissenter to read it.
Besides, Clippy, a paperclip is just a staple that can't commit.
And a staple is just a one-use paperclip.
So there.
Maybe sometime I'll write a post on why I think the paperclipper is a strawman. The paperclipper can't compete; it can happen only if a singleton goes bad.
I think everyone who talks about paperclippers is talking about singletons gone bad (rather, started out bad and having reached reflective consistency).
This is extremely confused. Wireheading is an evolutionary dead-end because wireheads ignore their surroundings. Paperclippers, and for that matter, staplers and FAIs pay exclusive attention to their surroundings and ignore their terminal utility functions except to protect them physically. It's just that after acquiring all the resources available, clippy makes clips and Friendly makes things that humans would want if they thought more clearly, such as the experience of less clear thinking humans eating ice cream.
What?
If we accept these semantics (a collection of clear thinkers is a "singleton" because you can imagine drawing a circle around them and labelling them a system), then there's no requirement for the thinkers to be clear, or to communicate cheaply. We are a singleton already.
Then the word singleton is useless.
No-one generally has to attempt to agree with other agents about ethics, they only have to take actions that take into account the conditional behaviors of others.
This is playing with semantics to sidestep real issues. No one "has to" attempt to agree with other agents, in the same sense that no one "has to" achieve their goals, or avoid pain, or live.
You're defining away everything of importance. All that's left is a universe of agents whose actions and conflicts are dismissed as just a part of computation of the great Singleton within us all. Om.
This article is a bit long. If it would not do violence to the ideas, I would prefer it had been broken up into a short series.
I think you're altogether correct, but with the caveat that "Friendly AI is useless and doomed to failure" is not a necessary conclusion of this piece.
Any one of us has consistent intuitions
I think this is false. Most of us have inconsistent intuitions, just like we have inconsistent beliefs. Though this strengthens, not undermines, your point.
...This means that our present ethical arguments are largely the result o
Another post with no abstract. The title does a poor job as a 1-line abstract. Failure to provide an abstract creates an immediate and powerful negative impression on me. If the 1234567 section was intended as an abstract, it was on page 2 for me - and I actually binned the post before getting that far initially. Just so you know.
I feel like this post provides arguments similar to those I would have given if I was mentally more organized. For months I've been asserting (without argument), 'don't you see? -- without "absolute values" to steer us, optimizing over preferences is incoherent". The incoherence stems from the fact that our preferences are mutable, and something we modify and optimize over a lifetime, and making distinctions between preferences given to us by genetics, our environmental history, or random chance is too arbitrary. There's no reason to eleva...
First of all, good post.
My main response is, aren't we running on hostile hardware? I am not the entire system called "Matt Simpson." I am a program, or a particular set of programs, running on the system called "Matt Simpson." This system runs lots of other programs. Some, like automatically pulling my hand away after I touch a hot stove, happen to achieve my values. Others, like running from or attacking anything that vaguely resembles a snake, are a minor annoyance. Still others, like the system wanting to violently attack othe...
Great post (really could have used some editing, though).
Where do we go from here, though? What approaches look promising?
As one example, I would say that the extraordinarity bias is in fact a preference. Or consider the happiness paradox: People who become paralyzed become extremely depressed only temporarily; people who win the lottery become very happy only temporarily. (Google 'happiness "set-point"'.) I've previously argued on LessWrong that this is not a bias, but a heuristic to achieve our preferences.
To add to the list, I've suggested before that scope insensitivity is just part of our preferences. I choose the dust specks over torture.
Re: Kuhn. You don't need the postscript to see that he's not arguing for the meaningless of scientific progress. For example, he specifically discusses how certain paradigms allow other paradigms (for example how one needed impetus theory to get Galilean mechanics and could not go straight from Aristotle to Galileo). Kuhn also emphasizes that paradigm changes occur when there is something that the paradigm cannot adequately explain. Kuhn's views are complicated and quite subtle.
Hitler had an entire ideology built around the idea, as I gather, that civilization was an evil constriction on the will to power; and artfully attached it to a few compatible cultural values.
The idea, that the civilization is an evil construction which "pollutes the environment and endangers species" is again very popular. That humans and humanity would be good, had they never build a technical civilization, is the backbone of the modern Green movement.
Ethics is not geometry
Western philosophy began at about the same time as Western geometry; and if you read Plato you'll see that he, and many philosophers after him, took geometry as a model for philosophy.
In geometry, you operate on timeless propositions with mathematical operators. All the content is in the propositions. A proof is equally valid regardless of the sequence of operators used to arrive at it. An algorithm that fails to find a proof when one exists is a poor algorithm.
The naive way philosophers usually map ethics onto mathematics is to suppose that a human mind contains knowledge (the propositional content), and that we think about that knowledge using operators. The operators themselves are not seen as the concern of philosophy. For instance, when studying values (I also use "preferences" here, as a synonym differing only in connotation), people suppose that a person's values are static propositions. The algorithms used to satisfy those values aren't themselves considered part of those values. The algorithms are considered to be only ways of manipulating the propositions; and are "correct" if they produce correct proofs, and "incorrect" if they don't.
But an agent's propositions aren't intelligent. An intelligent agent is a system, whose learned and inborn circuits produce intelligent behavior in a given environment. An analysis of propositions is not an analysis of an agent.
I will argue that:
Instincts, algorithms, preferences, and beliefs are artificial categories
There is no principled distinction between algorithms and propositions in any existing brain. This means that there's no clear way to partition an organism's knowledge into "propositions" (including "preferences" and "beliefs"), and "algorithms." Hence, you can't expect all of an agent's "preferences" to end up inside the part of the agent that you choose to call "propositions". Nor can you reliably distinguish "beliefs" from "preferences".
Suppose that a moth's brain is wired to direct its flight by holding the angle to the moon constant. (This is controversial, but the competing hypotheses would give similar talking points.) If so, is this a belief about the moon, a preference towards the moon, or an instinctive motor program? When it circles around a lamp, does it believe that lamp is the moon?
When a child pulls its hand away from something hot, does it value not burning itself and believe that hot things burn, or place a value on not touching hot things, or just have an evolved motor program that responds to hot things? Does your answer change if you learn that the hand was directed to pull back by spinal reflexes, without involving the cortex?
Monkeys can learn to fear snakes more easily than they can learn to fear flowers (Cook & Mineka 1989). Do monkeys, and perhaps humans, have an "instinctive preference" against snakes? Is it an instinct, a preference (snake = negative utility), or a learned behavior (lab monkeys are not afraid of snakes)?
Can we map the preference-belief distinction onto the distinction between instinct and learned behavior? That is, are all instincts preferences, and all preferences instincts? There are things we call instincts, like spinal reflexes, that I don't think can count as preferences. And there are preferences, such as the relative values I place on the music of Bach and Berg, that are not instincts. (In fact, these are the preferences we care about. The purpose of Friendly AI is not to retain the fist-clenching instinct for future generations.)
Bias, heuristic, or preference?
A "bias" is a reasoning procedure that produces an outcome that does not agree with some logic. But the object in nature is not to conform to logic; it is to produce advantageous behavior.
Suppose you interview Fred about his preferences. Then you write a utility function for Fred. You experiment, putting Fred in different situations and observing how he responds. You observe that Fred acts in ways that fail to optimize the utility function you wrote down, in a consistently-biased way.
Is Fred displaying bias? Or does the Fred-system, including both his beliefs and the bias imposed by his reasoning processes, implement a preference that is not captured in his beliefs alone?
Allegedly true story, from a Teaching Company audio lecture (I forget which one): A psychology professor was teaching a class about conditioned behavior. He also had the habit of pacing back and forth in front of the class.
The class decided to test his claims by leaning forward and looking interested when the professor moved toward the left side of the room, but acting bored when he moved toward the right side. By the end of the semester, they had trained him to give his entire lecture from the front left corner. When they asked him why he always stood there, he was surprised by the question - he wasn't even aware he had changed his habit.
If you inspected the professor's beliefs, and then studied his actions, you would conclude he was acting irrationally. But he wasn't. He was acting rationally, just not thinking rationally. His brain didn't detect the pattern in the class's behavior and deposit a proposition into his brain. It encoded the proper behavior, if not straight into his pre-motor cortex, at least not into any conscious beliefs.
Did he have a bias towards the left side of the room? Or a preference for seeing students pay attention? Or a preference that became a bias when the next semester began and he kept doing it?
Take your pick - there's no right answer.
If a heuristic gives answers consistently biased in one direction across a wide range of domains, we can call it a bias. Most biases found in the literature appear to be wide-ranging and value-neutral. But the literature on biases is itself biased (deliberately) towards discussing that type of bias. If we're trawling all of human behavior for values, we may run across many instances where we can't say whether a heuristic is a bias or a preference.
As one example, I would say that the extraordinarity bias is in fact a preference. Or consider the happiness paradox: People who become paralyzed become extremely depressed only temporarily; people who win the lottery become very happy only temporarily. (Google 'happiness "set-point"'.) I've previously argued on LessWrong that this is not a bias, but a heuristic to achieve our preferences. Happiness is proportional not to our present level of utility, but to the rate of change in our utility. Trying to maximize happiness (the rate of increase of utility) in the near term maximizes total utility over lifespan better than consciously attempting to maximize near-term utility would. This is because maximizing the rate of increase in utility over a short time period, instead of total utility over that time period, prefers behavior that has a small area under the utility curve during that time but ends with a higher utility than it started with, over behavior with a large area under the utilty curve that ends with a lower utility than it started with. This interpretation of happiness would mean that impact bias is not a bias at all, but a heuristic that compensates for this in order to maximize utility rather than happiness when we reason over longer time periods.
Environmental factors: Are they a preference or a bias?
Evolution does not distinguish between satisfying preconditions for behavior by putting knowledge into a brain, or by using the statistics of the environment. This means that the environment, which is not even present in the geometric model of ethics, is also part of your values.
When the aforementioned moth circles around a lamp, is it erroneously acting on a bias, or expressing moth preferences?
Humans like having sex. The teleological purpose of this preference is to cause them to have children. Yet we don't say that they are in error if they use birth control. This suggests that we consider our true preferences to be the organismal ones that trigger positive qualia, not the underlying evolutionary preferences.
Strict monogamy causes organisms that live in family units to evolve to act more altruistically, because their siblings are as related to them as their children are (West & Gardner 2010). Suppose that people from cultures with a long history of nuclear families and strict monogamy act, on average, more altruistically than people from other cultures; and you put people from both cultures together in a new environment with neither monogamy nor nuclear families. We would probably rather say that the people from these different cultures have different values; not that they both have the same preference to "help their genes", but that the people from the monogamous culture have an evolved bias that causes them to erroneously treat strangers nicely in this new environment. Again, we prefer the organismal preference.
However, if we follow this principle consistently, it prevents us from ever trying to improve ourselves, since it in effect defines our present selves as optimal:
So the "organismal vs. evolutionary" distinction doesn't help us choose what's a preference and what's a bias. Without any way of doing that, it is in principle impossible to create a category of "preferences" distinct from "preferred outcomes". A "value" consists of declarative knowledge, algorithms, and environment, taken together. Change any of those, and it's not the same value anymore.
This means that extrapolating human values into a different environment gives an error message.
A ray of hope? ...
I just made a point by presenting cases in which most people have intuitions about which outcome is correct, and showing that these intuitions don't follow a consistent rule.
So why do we have the intuitions?
If we have consistent intuitions, they must follow some rule. We just don't know what it is yet. Right?
... No.
We don't have consistent intuitions.
Any one of us has consistent intuitions; and those of us living in Western nations in the 21st century have a lot of intuitions in common. We can predict how most of these intuitions will fall out using some dominant cultural values. The examples involving monogamy and violent males rely on the present relatively high weight on the preference to reduce violent conflict. But this is a context-dependent value! <just-so story>It arises from living in a time and a place where technology makes interactions between tribes more frequent and more beneficial, and conflict more costly</just-so story>. But looking back in history, we see many people who would disagree with it:
The idea that violence (and sexism, racism, and slavery) is bad is a minority opinion in human cultures over history. Nobody likes being hit over the head with a stick by a stranger; but in pre-Christian Europe, it was the person who failed to prevent being struck, not the person doing the striking, whose virtue was criticized.
Konrad Lorenz believed that the more deadly an animal is, the more emotional attachment to its peers its species evolves, via group selection (Lorenz 1966). The past thousand years of history has been a steady process of humans building sharper claws, and choosing values that reduce their use, keeping net violence roughly constant. As weapons improve, cultural norms that promote conflict must go. First, the intellectuals (who were Christian theologians at the time) neutered masculinity; in the Enlightenment, they attacked religion; and in the 20th century, art. The ancients would probably find today's peaceful, offense-forgiving males as nauseating as I would find a future where the man on the street embraces postmodern art and literature.
This gradual sacrificing of values in order to attain more and more tolerance and empathy, is the most-noticable change in human values in all of history. This means it is the least-constant of human values. Yet we think of an infinite preference for non-violence and altruism as a foundational value! Our intuitions about our values are thus as mistaken as it is possible for them to be.
(The logic goes like this: Humans are learning more, and their beliefs are growing closer to the truth. Humans are becoming more tolerant and cooperative. Therefore, tolerant and cooperative values are closer to the truth. Oops! If you believe in moral truth, then you shouldn't be searching for human values in the first place!)
Catholics don't agree that having sex with a condom is good. They have an elaborate system of belief built on the idea that teleology express God's will, and so underlying purpose (what I call evolutionary preference) always trumps organismal preference.
And I cheated in the question on monogamy. Of course you said that being more altruistic wasn't an error. Everyone always says they're in favor of more altruism. It's like asking whether someone would like lower taxes. But the hypothesis was that people from non-monogamous or non-family-based cultures do in fact show lower levels of altruism. By hypothesis, then, they would be comfortable with their own levels of altruism, and might feel that higher levels are a bias.
Preferences are complicated and numerous, and arise in an evolutionary process that does not guarantee consistency. Having conflicting preferences makes action difficult. Energy minimization, a general principle that may underly much of our learning, simply means reducing conflicts in a network. The most basic operations of our neurons thus probably act to reduce conflicts between preferences.
But there are no "true, foundational" preferences from which to start. There's just a big network of them that can be pushed into any one of many stable configurations, depending on the current environment. There's the Catholic configuration, and the Nazi configuration, and the modern educated tolerant cosmopolitan configuration. If you're already in one of those configurations, it seems obvious what the right conclusion is for any particular value question; and this gives the illusion that we have some underlying principle by which we can properly choose what is a value and what is a bias. But it's just circular reasoning.
What about qualia?
But everyone agrees that pleasure is good, and pain is bad, right?
Not entirely - I could point to, say, medieval Europe, when many people believed that causing yourself needless pain was virtuous. But, by and large yes.
And beside the point (although see below). Because when we talk about values, the eventual applications we have in mind are never about qualia. Nobody has heated arguments about whose qualia are better. Nobody even really cares about qualia. Nobody is going to dedicate their life to building Friendly AI in order to ensure that beings a million years from now still dislike castor oil and enjoy chocolate.
We may be arguing about preserving a tendency to commit certain acts that give us a warm qualic glow, like helping a bird with a broken wing. But I don't believe there's a dedicated small-animal-empathy quale. More likely there's a hundred inferential steps linking an action, through our knowledge and thinking processes, to a general-purpose warm-glow quale.
Value is a network concept
Abstracting human behavior into "human values" is an ill-posed problem. It's an attempt to divine a simple description of our preferences, outside the context of our environment and our decision process. But we have no consistent way of deciding what are the preferences, and what is the context. We have the illusion that we can, because our intuitions give us answers to questions about preferences - but they use our contextually-situated preferences to do so. That's circular reasoning.
The problem in trying to root out foundational values for a person is the same as in trying to root out objective values for the universe, or trying to choose the "correct" axioms for a geometry. You can pick a set that is self-consistent; but you can't label your choice "the truth".
These are all network concepts, where we try to isolate things that exist only within a complex homogeneous network. Our mental models of complex networks follow mathematics, in which you choose a set of axioms as foundational; or social structures, in which you can identify a set of people as the prime movers. But these conceptions do not even model math or social structures correctly. Axioms are chosen for convenience, but a logic is an entire network of self-consistent statements, many different subsets of which could have been chosen as axioms. Social power does not originate with the rulers, or we would still have kings.
There is a very similar class of problems, including symbol grounding (trying to root out the nodes that are the sources of meaning in a semantic network), and philosophy of science (trying to determine how or whether the scientific process of choosing a set of beliefs given a set of experimental data converges on external truth as you gather more data). The crucial difference is that we have strong reasons for believing that these networks refer to an external domain, and their statements can be tested against the results from independent access to that domain. I call these referential network concepts. One system of referential network concepts can be more right than another; one system of non-referential network concepts can only be more self-consistent than another.
Referential network concepts cannot be given 0/1 truth-values at a finer granularity than the level at which a network concept refers to something in the extensional (referred-to) domain. For example, (Quine 1968) argues that a natural-language statement cannot be unambiguously parsed beyond the granularity of the behavior associated with it. This is isomorphic to my claim above that a value/preference can't be parsed beyond the granularity of the behavior of an agent acting in an environment.
Thomas Kuhn gained notoriety by arguing (Kuhn 1962) that there is no such thing as scientific progress, but only transitions between different stable states of belief; and that modern science is only different from ancient science, not better. (He denies this in the postscript to the 1969 edition, but it is the logical implication of both his arguments and the context he presents them in.) In other words, he claims science is a non-referential network concept. An interpretation in line with Quine would instead say that science is referential at the level of the experiment, and that ambiguities may remain in how we define the fine-grained concepts used to predict the outcomes of experiments.
Determining whether a network concept domain is referential or non-referential is tricky. The distinction was not even noticed until the 19th century. Until then, everyone who had ever studied geometry, so far as I know, believed there was one "correct" geometry, with Euclid's 5 postulates as axioms. But in the early 19th century, several mathematicians proved that you could build three different, consistent geometries depending on what you put in the place of Euclid's fifth postulate. The universe we live in most likely conforms to only one of these (making geometry referential in a physics class); but the others are equally valid mathematically (making geometry non-referential in a math class).
Is value referential, or non-referential?
There are two ways of interpreting this question, depending on whether one means "human values" or "absolute values".
Judgements of value expressed in human language are referential; they refer to human behavior. So human values are referential. You can decide whether claims about a particular human's values are true or false, as long as you don't extend those claims outside the context of that human's decision process and environment. This claim is isomorphic to Quine's claim about meaning in human language.
Asking about absolute values is isomorphic to applying the symbol-grounding problem to consciousness. Consciousness exists internally, and is finer-grained than human behaviors. Providing a symbol-grounding method that satisfied Quine's requirements would not provide any meanings accessible to consciousness. Stevan Harnad (Harnad 2000) described how symbols might be grounded for consciousness in sense perceptions and statistical regularities of those perceptions.
(This brings up an important point, which I will address later: You may be able to assign referential network concepts probabilistic or else fuzzy truth values at a finer level of granularity than the level of correspondence. A preview: This doesn't get you out of the difficulty, because the ambiguous cases don't have mutual information with which they could help resolve each other.)
Can an analogous way be found to ground absolute values? Yes and no. You can choose axioms that are hard to argue with, like "existence is better than non-existence", "pleasure is better than pain", or "complexity is better than simplicity". (I find "existence is better than non-existence" pretty hard to argue with; but Buddhists disagree.) If you can interpret them in an unambiguous way, and define a utility calculus enabling you to make numeric comparisons, you may be able to make "absolute" comparisons between value systems relative to your axioms.
You would also need to make some choices we've talked about here before, such as "use summed utility" or "use average utility". And you would need to make many possibly-arbitrary interpretation assumptions such as what pleasure is, what complexity is, or what counts as an agent. The gray area between absolute and relative values is in how self-evident all these axioms, decisions, and assumptions are. But any results at all - even if they provide guidance only in decisions such as "destroy / don't destroy the universe" - would mean we could claim there is a way for values to be referential at a finer granularity than that of an agent's behavior. And things that seem arbitrary to us today may turn out not to be; for example, I've argued here that average utilitarianism can be derived from the von Neumann-Morgenstern theorem on utility.
... It doesn't matter WRT friendly AI and coherent extrapolated volition.
Even supposing there is a useful, correct, absolute lattice on value system and/or values, it doesn't forward the project of trying to instill human values in artificial intelligences. There are 2 possible cases:
Fuzzy values and fancy math don't help
So far, I've looked at cases of ambiguous values only one behavior at a time. I mentioned above that you can assign probabilities to different value interpretations of a behavior. Can we take a network of many probabilistic interpretations, and use energy minimization or some other mathematics to refine the probabilities?
No; because for the ambiguities of interest, we have no access to any of the mutual information between how to resolve two different ambiguities. The ambiguity is in whether the hypothesized "true value" would agree or disagree with the results given by the initial propositional system plus a different decision process and/or environment. In every case, this information is missing. No clever math can provide this information from our existing data, no matter how many different cases we combine.
Nor should we hope to find correlations between "true values" that will help us refine our estimates for one value given a different unambiguous value. The search for values is isomorphic to the search for personality primitives. The approach practiced by psychologists is to use factor analysis to take thousands of answers to questions that are meant to test personality phenotype, and mathematically reduce these to discover a few underlying ("latent") independent personality variables, most famously in the Big 5 personality scale (reviewed in Goldberg 1993). In other words: The true personality traits, and by analogy the true values a person holds, are by definition independent of each other.
We expect, nonetheless, to find correlations between the component of these different values that resides in decision processes. This is because it is efficient to re-use decision processes as often as possible. Evolution should favor partitioning values between propositions, algorithms, and environment in a way that minimizes the number of algorithms needed. These correlations will not help us, because they have to do only with how a value is implemented within an organism, and say nothing about how the value would be extended into a different organism or environment.
In fact, I propose that the different value systems popular among humans, and the resulting ethical arguments, are largely different ways of partitioning values between propositions, algorithms, and environment, that each result in a relatively simple set of algorithms, and each in fact give the same results in most situations that our ancestors would have encountered. It is the attempt to extrapolate human values into the new, manmade environment that causes ethical disagreements. This means that our present ethical arguments are largely the result of cultural change over the past few thousand years; and that the next few hundred years of change will provide ample grounds for additional arguments even if we resolve today's disagreements.
Summary
Philosophically-difficult domains often involve network concepts, where each component depends on other components, and the dependency graph has cycles. The simplest models of network concepts suppose that there are some original, primary nodes in the network that everything depends on.
We have learned to stop applying these models to geometry and supposing there is one true set of axioms. We have learned to stop applying these models to biology, and accept that life evolved, rather than that reality is divided into Creators (the primary nodes) and Creatures. We are learning to stop applying them to morals, and accept that morality depends on context and biology, rather than being something you can extract from its context. We should also learn to stop applying them to the preferences directing the actions of intelligent agents.
Attempting to identify values is a network problem, and you cannot identify the "true" values of a species, or of a person, as they would exist outside of their current brain and environment. The only consistent result you can arrive at by trying to produce something that implements human values, is to produce more humans.
This means that attempting to instill human values into an AI is an ill-posed problem that has no complete solution. The only escape from this conclusion is to turn to absolute values - in which case you shouldn't be using human values in the first place.
This doesn't mean that we have no information about how human values can be extrapolated beyond humans. It means that the more different an agent and an environment are from the human case, the greater the number of different value systems there are that are consistent with human values. However, it appears to me, from the examples and the reasoning given here, that the components of values that we can resolve are those that are evolutionarily stable (and seldom distinctly human); while the contentious component of values that people argue about are their extensions into novel situations, which are undefined. From that I infer that, even if we pin down present-day human values precisely, the ambiguity inherent in extrapolating them into novel environments and new cognitive architectures will make the near future as contentious as the present.
References
Michael Cook & Susan Mineka (1989). Observational conditioning of fear to fear-relevant versus fear-irrelevant stimuli in rhesus monkeys. Journal of Abnormal Psychology 98(4): 448-459.
Lewis Goldberg (1993). The structure of phenotypic personality traits. American Psychologist 48: 26-34.
Stevan Harnad (1990) The Symbol Grounding Problem. Physica D 42: 335-346.
Thomas Kuhn (1962). The Structure of Scientific Revolutions. 1st. ed., Chicago: Univ. of Chicago Press.
Konrad Lorenz (1966). On Aggression. New York: Harcourt Brace.
Willard Quine (1969). Ontological relativity. The Journal of Philosophy 65(7): 185-212.
Andreia Santos, Andreas Meyer-Lindenberg, Christine Deruelle (2010). Absence of racial, but not gender, stereotyping in Williams syndrome children. Current Biology 20(7), April 13: R307-R308.
Stuart A. West and Andy Gardner (2010). . Science 12 March 2010: 1341-1344.