Only humans can have human values

PhilGoetz

Ethics is not geometry

Western philosophy began at about the same time as Western geometry; and if you read Plato you'll see that he, and many philosophers after him, took geometry as a model for philosophy.

In geometry, you operate on timeless propositions with mathematical operators. All the content is in the propositions. A proof is equally valid regardless of the sequence of operators used to arrive at it. An algorithm that fails to find a proof when one exists is a poor algorithm.

The naive way philosophers usually map ethics onto mathematics is to suppose that a human mind contains knowledge (the propositional content), and that we think about that knowledge using operators. The operators themselves are not seen as the concern of philosophy. For instance, when studying values (I also use "preferences" here, as a synonym differing only in connotation), people suppose that a person's values are static propositions. The algorithms used to satisfy those values aren't themselves considered part of those values. The algorithms are considered to be only ways of manipulating the propositions; and are "correct" if they produce correct proofs, and "incorrect" if they don't.

But an agent's propositions aren't intelligent. An intelligent agent is a system, whose learned and inborn circuits produce intelligent behavior in a given environment. An analysis of propositions is not an analysis of an agent.

I will argue that:

The only preferences that can be unambiguously determined are the preferences people implement, which are not always the preferences expressed by their beliefs.
If you extract a set of propositions from an existing agent, then build a new agent to use those propositions in a different environment, with an "improved" logic, you can't claim that it has the same values.
Values exist in a network of other values. A key ethical question is to what degree values are referential (meaning they can be tested against something outside that network); or non-referential (and hence relative).
Supposing that values are referential helps only by telling you to ignore human values.
You cannot resolve the problem by combining information from different behaviors, because the needed information is missing.
Today's ethical disagreements are largely the result of attempting to extrapolate ancestral human values into a changing world.
The future will thus be ethically contentious even if we accurately characterize and agree on present human values.

Instincts, algorithms, preferences, and beliefs are artificial categories

There is no principled distinction between algorithms and propositions in any existing brain. This means that there's no clear way to partition an organism's knowledge into "propositions" (including "preferences" and "beliefs"), and "algorithms." Hence, you can't expect all of an agent's "preferences" to end up inside the part of the agent that you choose to call "propositions". Nor can you reliably distinguish "beliefs" from "preferences".

Suppose that a moth's brain is wired to direct its flight by holding the angle to the moon constant. (This is controversial, but the competing hypotheses would give similar talking points.) If so, is this a belief about the moon, a preference towards the moon, or an instinctive motor program? When it circles around a lamp, does it believe that lamp is the moon?

When a child pulls its hand away from something hot, does it value not burning itself and believe that hot things burn, or place a value on not touching hot things, or just have an evolved motor program that responds to hot things? Does your answer change if you learn that the hand was directed to pull back by spinal reflexes, without involving the cortex?

Monkeys can learn to fear snakes more easily than they can learn to fear flowers (Cook & Mineka 1989). Do monkeys, and perhaps humans, have an "instinctive preference" against snakes? Is it an instinct, a preference (snake = negative utility), or a learned behavior (lab monkeys are not afraid of snakes)?

Can we map the preference-belief distinction onto the distinction between instinct and learned behavior? That is, are all instincts preferences, and all preferences instincts? There are things we call instincts, like spinal reflexes, that I don't think can count as preferences. And there are preferences, such as the relative values I place on the music of Bach and Berg, that are not instincts. (In fact, these are the preferences we care about. The purpose of Friendly AI is not to retain the fist-clenching instinct for future generations.)

Bias, heuristic, or preference?

A "bias" is a reasoning procedure that produces an outcome that does not agree with some logic. But the object in nature is not to conform to logic; it is to produce advantageous behavior.

Suppose you interview Fred about his preferences. Then you write a utility function for Fred. You experiment, putting Fred in different situations and observing how he responds. You observe that Fred acts in ways that fail to optimize the utility function you wrote down, in a consistently-biased way.

Is Fred displaying bias? Or does the Fred-system, including both his beliefs and the bias imposed by his reasoning processes, implement a preference that is not captured in his beliefs alone?

Allegedly true story, from a Teaching Company audio lecture (I forget which one): A psychology professor was teaching a class about conditioned behavior. He also had the habit of pacing back and forth in front of the class.

The class decided to test his claims by leaning forward and looking interested when the professor moved toward the left side of the room, but acting bored when he moved toward the right side. By the end of the semester, they had trained him to give his entire lecture from the front left corner. When they asked him why he always stood there, he was surprised by the question - he wasn't even aware he had changed his habit.

If you inspected the professor's beliefs, and then studied his actions, you would conclude he was acting irrationally. But he wasn't. He was acting rationally, just not thinking rationally. His brain didn't detect the pattern in the class's behavior and deposit a proposition into his brain. It encoded the proper behavior, if not straight into his pre-motor cortex, at least not into any conscious beliefs.

Did he have a bias towards the left side of the room? Or a preference for seeing students pay attention? Or a preference that became a bias when the next semester began and he kept doing it?

Take your pick - there's no right answer.

If a heuristic gives answers consistently biased in one direction across a wide range of domains, we can call it a bias. Most biases found in the literature appear to be wide-ranging and value-neutral. But the literature on biases is itself biased (deliberately) towards discussing that type of bias. If we're trawling all of human behavior for values, we may run across many instances where we can't say whether a heuristic is a bias or a preference.

As one example, I would say that the extraordinarity bias is in fact a preference. Or consider the happiness paradox: People who become paralyzed become extremely depressed only temporarily; people who win the lottery become very happy only temporarily. (Google 'happiness "set-point"'.) I've previously argued on LessWrong that this is not a bias, but a heuristic to achieve our preferences. Happiness is proportional not to our present level of utility, but to the rate of change in our utility. Trying to maximize happiness (the rate of increase of utility) in the near term maximizes total utility over lifespan better than consciously attempting to maximize near-term utility would. This is because maximizing the rate of increase in utility over a short time period, instead of total utility over that time period, prefers behavior that has a small area under the utility curve during that time but ends with a higher utility than it started with, over behavior with a large area under the utilty curve that ends with a lower utility than it started with. This interpretation of happiness would mean that impact bias is not a bias at all, but a heuristic that compensates for this in order to maximize utility rather than happiness when we reason over longer time periods.

Environmental factors: Are they a preference or a bias?

Evolution does not distinguish between satisfying preconditions for behavior by putting knowledge into a brain, or by using the statistics of the environment. This means that the environment, which is not even present in the geometric model of ethics, is also part of your values.

When the aforementioned moth circles around a lamp, is it erroneously acting on a bias, or expressing moth preferences?

Humans like having sex. The teleological purpose of this preference is to cause them to have children. Yet we don't say that they are in error if they use birth control. This suggests that we consider our true preferences to be the organismal ones that trigger positive qualia, not the underlying evolutionary preferences.

Strict monogamy causes organisms that live in family units to evolve to act more altruistically, because their siblings are as related to them as their children are (West & Gardner 2010). Suppose that people from cultures with a long history of nuclear families and strict monogamy act, on average, more altruistically than people from other cultures; and you put people from both cultures together in a new environment with neither monogamy nor nuclear families. We would probably rather say that the people from these different cultures have different values; not that they both have the same preference to "help their genes", but that the people from the monogamous culture have an evolved bias that causes them to erroneously treat strangers nicely in this new environment. Again, we prefer the organismal preference.

However, if we follow this principle consistently, it prevents us from ever trying to improve ourselves, since it in effect defines our present selves as optimal:

Humans like eating food with fat, sugar, and salt. In our ancestral context, that expressed the human value of optimizing nutrition. The evolutionary preference is for good nutrition; the organismal preference is for fat, sugar, and salt. By analogy to contraception, liking fat, sugar, and salt is not an evolved but dysfunctional bias in taste; it's a true human value.
Suppose fear of snakes is triggered by the shape and motion of snakes. The organismal preference is against snakes. The evolutionary preference is against poisonous snakes. If the world is now full of friendly cybernetic snakes, you must conclude that prejudice against them is a human value to be preserved, not a bias to be overcome. Death to the friendly snakes!
Men enjoy violence. Hitting a stranger over the head with a stick is naturally fun to human males, and it takes a lot of social conditioning to get them not to do this, or at least to restrict themselves to video games. By what principle can we say that this is merely an obsolete heuristic to protect the tribe that is no longer helpful in our present environment; yet having sex with a condom is enjoying a preference?
(Santos et al. 2010) reports (summarized in Science Online) that children with a genetic mutation causing Williams syndrome, which causes less fear of strangers, have impaired racial stereotyping, but intact gender stereotyping. This suggests that racism, and perhaps sexism, are evolved preferences actively implemented by gene networks.

So the "organismal vs. evolutionary" distinction doesn't help us choose what's a preference and what's a bias. Without any way of doing that, it is in principle impossible to create a category of "preferences" distinct from "preferred outcomes". A "value" consists of declarative knowledge, algorithms, and environment, taken together. Change any of those, and it's not the same value anymore.

This means that extrapolating human values into a different environment gives an error message.

A ray of hope? ...

I just made a point by presenting cases in which most people have intuitions about which outcome is correct, and showing that these intuitions don't follow a consistent rule.

So why do we have the intuitions?

If we have consistent intuitions, they must follow some rule. We just don't know what it is yet. Right?

... No.

We don't have consistent intuitions.

Any one of us has consistent intuitions; and those of us living in Western nations in the 21st century have a lot of intuitions in common. We can predict how most of these intuitions will fall out using some dominant cultural values. The examples involving monogamy and violent males rely on the present relatively high weight on the preference to reduce violent conflict. But this is a context-dependent value! <just-so story>It arises from living in a time and a place where technology makes interactions between tribes more frequent and more beneficial, and conflict more costly</just-so story>. But looking back in history, we see many people who would disagree with it:

Historians struggle to explain the origins of World War I and the U.S. Civil War. Sometimes the simplest answer is best: They were for fun. Men on both sides were itching for an excuse to fight.
In the 19th century, Americans killed off the Native Americans to have their land. Americans universally condemn that action now that they are secure in its benefits; most Americans condoned it at the time.
Homer would not have agreed that violence is bad! Skill at violence was the greatest virtue to the ancient Greeks. The tension that generates tragedy in the Iliad is not between violence and empathy, but between saving one's kin and saving one's honor. Hector is conflicted, but not about killing Greeks. His speech on his own tragedy ends with his wishes for his son: "May he bring back the blood-stained spoils of him whom he has laid low, and let his mother's heart be glad."
The Nazis wouldn't have agreed that enjoying violence was bad. We have learned nothing if we think the Nazis rose to power because Germans suddenly went mad en masse, or because Hitler gave really good speeches. Hitler had an entire ideology built around the idea, as I gather, that civilization was an evil constriction on the will to power; and artfully attached it to a few compatible cultural values.
A similar story could be told about communism.

The idea that violence (and sexism, racism, and slavery) is bad is a minority opinion in human cultures over history. Nobody likes being hit over the head with a stick by a stranger; but in pre-Christian Europe, it was the person who failed to prevent being struck, not the person doing the striking, whose virtue was criticized.

Konrad Lorenz believed that the more deadly an animal is, the more emotional attachment to its peers its species evolves, via group selection (Lorenz 1966). The past thousand years of history has been a steady process of humans building sharper claws, and choosing values that reduce their use, keeping net violence roughly constant. As weapons improve, cultural norms that promote conflict must go. First, the intellectuals (who were Christian theologians at the time) neutered masculinity; in the Enlightenment, they attacked religion; and in the 20th century, art. The ancients would probably find today's peaceful, offense-forgiving males as nauseating as I would find a future where the man on the street embraces postmodern art and literature.

This gradual sacrificing of values in order to attain more and more tolerance and empathy, is the most-noticable change in human values in all of history. This means it is the least-constant of human values. Yet we think of an infinite preference for non-violence and altruism as a foundational value! Our intuitions about our values are thus as mistaken as it is possible for them to be.

(The logic goes like this: Humans are learning more, and their beliefs are growing closer to the truth. Humans are becoming more tolerant and cooperative. Therefore, tolerant and cooperative values are closer to the truth. Oops! If you believe in moral truth, then you shouldn't be searching for human values in the first place!)

Catholics don't agree that having sex with a condom is good. They have an elaborate system of belief built on the idea that teleology express God's will, and so underlying purpose (what I call evolutionary preference) always trumps organismal preference.

And I cheated in the question on monogamy. Of course you said that being more altruistic wasn't an error. Everyone always says they're in favor of more altruism. It's like asking whether someone would like lower taxes. But the hypothesis was that people from non-monogamous or non-family-based cultures do in fact show lower levels of altruism. By hypothesis, then, they would be comfortable with their own levels of altruism, and might feel that higher levels are a bias.

Preferences are complicated and numerous, and arise in an evolutionary process that does not guarantee consistency. Having conflicting preferences makes action difficult. Energy minimization, a general principle that may underly much of our learning, simply means reducing conflicts in a network. The most basic operations of our neurons thus probably act to reduce conflicts between preferences.

But there are no "true, foundational" preferences from which to start. There's just a big network of them that can be pushed into any one of many stable configurations, depending on the current environment. There's the Catholic configuration, and the Nazi configuration, and the modern educated tolerant cosmopolitan configuration. If you're already in one of those configurations, it seems obvious what the right conclusion is for any particular value question; and this gives the illusion that we have some underlying principle by which we can properly choose what is a value and what is a bias. But it's just circular reasoning.

What about qualia?

But everyone agrees that pleasure is good, and pain is bad, right?

Not entirely - I could point to, say, medieval Europe, when many people believed that causing yourself needless pain was virtuous. But, by and large yes.

And beside the point (although see below). Because when we talk about values, the eventual applications we have in mind are never about qualia. Nobody has heated arguments about whose qualia are better. Nobody even really cares about qualia. Nobody is going to dedicate their life to building Friendly AI in order to ensure that beings a million years from now still dislike castor oil and enjoy chocolate.

We may be arguing about preserving a tendency to commit certain acts that give us a warm qualic glow, like helping a bird with a broken wing. But I don't believe there's a dedicated small-animal-empathy quale. More likely there's a hundred inferential steps linking an action, through our knowledge and thinking processes, to a general-purpose warm-glow quale.

Value is a network concept

Abstracting human behavior into "human values" is an ill-posed problem. It's an attempt to divine a simple description of our preferences, outside the context of our environment and our decision process. But we have no consistent way of deciding what are the preferences, and what is the context. We have the illusion that we can, because our intuitions give us answers to questions about preferences - but they use our contextually-situated preferences to do so. That's circular reasoning.

The problem in trying to root out foundational values for a person is the same as in trying to root out objective values for the universe, or trying to choose the "correct" axioms for a geometry. You can pick a set that is self-consistent; but you can't label your choice "the truth".

These are all network concepts, where we try to isolate things that exist only within a complex homogeneous network. Our mental models of complex networks follow mathematics, in which you choose a set of axioms as foundational; or social structures, in which you can identify a set of people as the prime movers. But these conceptions do not even model math or social structures correctly. Axioms are chosen for convenience, but a logic is an entire network of self-consistent statements, many different subsets of which could have been chosen as axioms. Social power does not originate with the rulers, or we would still have kings.

There is a very similar class of problems, including symbol grounding (trying to root out the nodes that are the sources of meaning in a semantic network), and philosophy of science (trying to determine how or whether the scientific process of choosing a set of beliefs given a set of experimental data converges on external truth as you gather more data). The crucial difference is that we have strong reasons for believing that these networks refer to an external domain, and their statements can be tested against the results from independent access to that domain. I call these referential network concepts. One system of referential network concepts can be more right than another; one system of non-referential network concepts can only be more self-consistent than another.

Referential network concepts cannot be given 0/1 truth-values at a finer granularity than the level at which a network concept refers to something in the extensional (referred-to) domain. For example, (Quine 1968) argues that a natural-language statement cannot be unambiguously parsed beyond the granularity of the behavior associated with it. This is isomorphic to my claim above that a value/preference can't be parsed beyond the granularity of the behavior of an agent acting in an environment.

Thomas Kuhn gained notoriety by arguing (Kuhn 1962) that there is no such thing as scientific progress, but only transitions between different stable states of belief; and that modern science is only different from ancient science, not better. (He denies this in the postscript to the 1969 edition, but it is the logical implication of both his arguments and the context he presents them in.) In other words, he claims science is a non-referential network concept. An interpretation in line with Quine would instead say that science is referential at the level of the experiment, and that ambiguities may remain in how we define the fine-grained concepts used to predict the outcomes of experiments.

Determining whether a network concept domain is referential or non-referential is tricky. The distinction was not even noticed until the 19th century. Until then, everyone who had ever studied geometry, so far as I know, believed there was one "correct" geometry, with Euclid's 5 postulates as axioms. But in the early 19th century, several mathematicians proved that you could build three different, consistent geometries depending on what you put in the place of Euclid's fifth postulate. The universe we live in most likely conforms to only one of these (making geometry referential in a physics class); but the others are equally valid mathematically (making geometry non-referential in a math class).

Is value referential, or non-referential?

There are two ways of interpreting this question, depending on whether one means "human values" or "absolute values".

Judgements of value expressed in human language are referential; they refer to human behavior. So human values are referential. You can decide whether claims about a particular human's values are true or false, as long as you don't extend those claims outside the context of that human's decision process and environment. This claim is isomorphic to Quine's claim about meaning in human language.

Asking about absolute values is isomorphic to applying the symbol-grounding problem to consciousness. Consciousness exists internally, and is finer-grained than human behaviors. Providing a symbol-grounding method that satisfied Quine's requirements would not provide any meanings accessible to consciousness. Stevan Harnad (Harnad 2000) described how symbols might be grounded for consciousness in sense perceptions and statistical regularities of those perceptions.

(This brings up an important point, which I will address later: You may be able to assign referential network concepts probabilistic or else fuzzy truth values at a finer level of granularity than the level of correspondence. A preview: This doesn't get you out of the difficulty, because the ambiguous cases don't have mutual information with which they could help resolve each other.)

Can an analogous way be found to ground absolute values? Yes and no. You can choose axioms that are hard to argue with, like "existence is better than non-existence", "pleasure is better than pain", or "complexity is better than simplicity". (I find "existence is better than non-existence" pretty hard to argue with; but Buddhists disagree.) If you can interpret them in an unambiguous way, and define a utility calculus enabling you to make numeric comparisons, you may be able to make "absolute" comparisons between value systems relative to your axioms.

You would also need to make some choices we've talked about here before, such as "use summed utility" or "use average utility". And you would need to make many possibly-arbitrary interpretation assumptions such as what pleasure is, what complexity is, or what counts as an agent. The gray area between absolute and relative values is in how self-evident all these axioms, decisions, and assumptions are. But any results at all - even if they provide guidance only in decisions such as "destroy / don't destroy the universe" - would mean we could claim there is a way for values to be referential at a finer granularity than that of an agent's behavior. And things that seem arbitrary to us today may turn out not to be; for example, I've argued here that average utilitarianism can be derived from the von Neumann-Morgenstern theorem on utility.

... It doesn't matter WRT friendly AI and coherent extrapolated volition.

Even supposing there is a useful, correct, absolute lattice on value system and/or values, it doesn't forward the project of trying to instill human values in artificial intelligences. There are 2 possible cases:

There are no absolute values. Then we revert to judgements of human values, which, as argued above, have no unambiguous interpretation outside of a human context.
There are absolute values. In which case, we should use them, not human values, whenever we can discern them.

Fuzzy values and fancy math don't help

So far, I've looked at cases of ambiguous values only one behavior at a time. I mentioned above that you can assign probabilities to different value interpretations of a behavior. Can we take a network of many probabilistic interpretations, and use energy minimization or some other mathematics to refine the probabilities?

No; because for the ambiguities of interest, we have no access to any of the mutual information between how to resolve two different ambiguities. The ambiguity is in whether the hypothesized "true value" would agree or disagree with the results given by the initial propositional system plus a different decision process and/or environment. In every case, this information is missing. No clever math can provide this information from our existing data, no matter how many different cases we combine.

Nor should we hope to find correlations between "true values" that will help us refine our estimates for one value given a different unambiguous value. The search for values is isomorphic to the search for personality primitives. The approach practiced by psychologists is to use factor analysis to take thousands of answers to questions that are meant to test personality phenotype, and mathematically reduce these to discover a few underlying ("latent") independent personality variables, most famously in the Big 5 personality scale (reviewed in Goldberg 1993). In other words: The true personality traits, and by analogy the true values a person holds, are by definition independent of each other.

We expect, nonetheless, to find correlations between the component of these different values that resides in decision processes. This is because it is efficient to re-use decision processes as often as possible. Evolution should favor partitioning values between propositions, algorithms, and environment in a way that minimizes the number of algorithms needed. These correlations will not help us, because they have to do only with how a value is implemented within an organism, and say nothing about how the value would be extended into a different organism or environment.

In fact, I propose that the different value systems popular among humans, and the resulting ethical arguments, are largely different ways of partitioning values between propositions, algorithms, and environment, that each result in a relatively simple set of algorithms, and each in fact give the same results in most situations that our ancestors would have encountered. It is the attempt to extrapolate human values into the new, manmade environment that causes ethical disagreements. This means that our present ethical arguments are largely the result of cultural change over the past few thousand years; and that the next few hundred years of change will provide ample grounds for additional arguments even if we resolve today's disagreements.

Summary

Philosophically-difficult domains often involve network concepts, where each component depends on other components, and the dependency graph has cycles. The simplest models of network concepts suppose that there are some original, primary nodes in the network that everything depends on.

We have learned to stop applying these models to geometry and supposing there is one true set of axioms. We have learned to stop applying these models to biology, and accept that life evolved, rather than that reality is divided into Creators (the primary nodes) and Creatures. We are learning to stop applying them to morals, and accept that morality depends on context and biology, rather than being something you can extract from its context. We should also learn to stop applying them to the preferences directing the actions of intelligent agents.

Attempting to identify values is a network problem, and you cannot identify the "true" values of a species, or of a person, as they would exist outside of their current brain and environment. The only consistent result you can arrive at by trying to produce something that implements human values, is to produce more humans.

This means that attempting to instill human values into an AI is an ill-posed problem that has no complete solution. The only escape from this conclusion is to turn to absolute values - in which case you shouldn't be using human values in the first place.

This doesn't mean that we have no information about how human values can be extrapolated beyond humans. It means that the more different an agent and an environment are from the human case, the greater the number of different value systems there are that are consistent with human values. However, it appears to me, from the examples and the reasoning given here, that the components of values that we can resolve are those that are evolutionarily stable (and seldom distinctly human); while the contentious component of values that people argue about are their extensions into novel situations, which are undefined. From that I infer that, even if we pin down present-day human values precisely, the ambiguity inherent in extrapolating them into novel environments and new cognitive architectures will make the near future as contentious as the present.

References

Michael Cook & Susan Mineka (1989). Observational conditioning of fear to fear-relevant versus fear-irrelevant stimuli in rhesus monkeys. Journal of Abnormal Psychology 98(4): 448-459.

Lewis Goldberg (1993). The structure of phenotypic personality traits. American Psychologist 48: 26-34.

Stevan Harnad (1990) The Symbol Grounding Problem. Physica D 42: 335-346.

Thomas Kuhn (1962). The Structure of Scientific Revolutions. 1st. ed., Chicago: Univ. of Chicago Press.

Konrad Lorenz (1966). On Aggression. New York: Harcourt Brace.

Willard Quine (1969). Ontological relativity. The Journal of Philosophy 65(7): 185-212.

Andreia Santos, Andreas Meyer-Lindenberg, Christine Deruelle (2010). Absence of racial, but not gender, stereotyping in Williams syndrome children. Current Biology 20(7), April 13: R307-R308.

Stuart A. West and Andy Gardner (2010). Altruism, Spite, and Greenbeards. Science 12 March 2010: 1341-1344.

I suppose I might count as someone who favors "organismal" preferences over confusing the metaphorical "preferences" of our genes with those of the individual. I think your argument against this is pretty weak.

You claim that favoring the "organismal" over the "evolutionary" fails to accurately identify our values in four cases, but I fail to see any problem with these cases.

I find no problem with upholding the human preference for foods which taste fatty, sugary and salty. (Note that consistently applied, the "organismal" preference would be for the fatty, sugary and salty taste and not foods that are actually fatty, sugary and salty. E.g. We like drinking diet Pepsi with Splenda almost as much as Pepsi and in a way roughly proportional to the success with which Splenda mimics the taste of sugar. We could even go one step further and drop the actual food part, valuing just the experience of [seemingly] eating fatty, sugary and salty foods.) This doesn't necessarily commit me to valuing an unhealthy diet all things considered because we also have many other preferences, e.g. for our health, which may outweigh this true human value.

... (read more)

3PhilGoetz16y

Voted up for thought and effort. BTW, when I started writing this last week, I thought I always preferred organismal preferences. That's a good point. But in the context of designing a Friendly AI that implements human values, it means we have to design the AI to like fatty, sugary, and salty tastes. Doesn't that seem odd to you? Maybe not the sort of thing we should be fighting to preserve? I don't see how. Are you going to kill the snakes, or not? Do you mean that you can use technology to let people experience simulated violence without actually hurting anybody? Doesn't that seem like building an inconsistency into your utopia? Wouldn't having a large number of such inconsistencies make utopia unstable, or lacking in integrity? That's how I said we resolve all of these cases. Only it doesn't get outweighed by a single different value (the Prime Mover model); it gets outweighed by an entire, consistent, locally-optimal energy-minimizing set of values. This seems to be at the core of your comment, but I can't parse that sentence. My emphasis is not on defeating opposing views (except the initial "preferences are propositions" / ethics-as-geometry view), but on setting out my view, and overcoming the objections to it that I came up with. For instance, when I talked about the intuitions of humans over time not being consistent, I wasn't attacking the view that human values are universal. I was overcoming the objection that we must have an algorithm for choosing evolutionary or organismal preferences, if we seem to agree on the right conclusion in most cases. Which conclusion did you have in mind? The key conclusion is that value can't be unambiguously analyzed at a finer level of detail than the behavior, in the way that communication can't be unambiguously analyzed at a finer level of detail than the proposition. You haven't said anything about that. (I just realized this makes me a structuralist above some level of detail, but a post-structuralist below it.

6MichaelVassar16y

The FAI shouldn't like sugary tastes, sex, violence, bad arguments, whatever. It should like us to experience sugary tastes, sex, violence, bad arguments, whatever. "I don't see how. Are you going to kill the snakes, or not?" Presumably you act out a weighted balance of the voting power of possible human preferences extrapolated over different possible environments which they might create for themselves. " Do you mean that you can use technology to let people experience simulated violence without actually hurting anybody? Doesn't that seem like building an inconsistency into your utopia? Wouldn't having a large number of such inconsistencies make utopia unstable, or lacking in integrity?" I don't understand the problem here. I don't mean that this is the correct solution, though it is the obvious solution, but rather that I don't see what the problem is. Ancients, who endorsed violence, generally didn't understand or believe in personal death anyway.

2PhilGoetz16y

You're going back to Eliezer's plan to build a single OS FAI. I should have clarified that I'm speaking of a plan to make AIs that have human values, for the sake of simplicity. (Which IMHO is a much, much better and safer plan.) Yes, if your goal is to build an OS FAI, that's correct. It doesn't get around the problem. Why should we design an AI to ensure that everyone for the rest of history is so much like us, and enjoys fat, sugar, salt, and the other things we do? That's a tragic waste of a universe. Why extrapolate over different possible environments to make a decision in this environment? What does that buy you? Do you do that today? EDIT: I think I see what you mean. You mean construct a distribution of possible extensions of existing preferences into different environments, and weigh each one according to some function. Such as internal consistency / energy minimization. Which, I would guess, is a preferred Bayesian method of doing CEV. My intuition is that this won't work, because what you need to make it work is prior odds over events that have never been observed. I think we need to figure out a way to do the math to settle this. It seems irrational, and wasteful, to deliberately construct a utopia where you give people impulses, and work to ensure that the mental and physical effort consumed by acting on those impulses is wasted. It also seems like a recipe for unrest. And, from an engineering perspective, it's an ugly design. It's like building a car with extra controls that don't do anything.

9RobinHanson16y

Well a key hard problem is: what features about ourselves that we like should we try to ensure endure into the future? Yes some features seem hopelessly provincial, while others seem more universally good, but how can we systematically judge this?

9Gavin16y

It seems irrational, and wasteful, to deliberately construct a utopia where you give people impulses, and work to ensure that the mental and physical effort consumed by acting on those impulses is wasted. I think you're dancing around a bigger problem: once we have a sufficiently powerful AI, you and I are just a bunch of extra meat and buggy programming. Our physical and mental effort is just not needed or relevant. The purpose of FAI is to make sure that we get put out to pasture in a Friendly way. Or, depending on your mood, you could phrase it as living on in true immortality to watch the glory that we have created unfold. It's like building a car with extra controls that don't do anything. I think the more important question is what, in this analogy, does the car do?

-2PhilGoetz16y

I get the impression that's part of the SIAI plan, but it seems to me that the plan entails that that's all there is, from then on, for the universe. The FAI needs control of all resources to prevent other AIs from being made; and the FAI has no other goals than its human-value-fulfilling goals; so it turns the universe into a rest home for humans. That's just another variety of paperclipper. If I'm wrong, and SIAI wants to allocate some resources to the human preserve, while letting the rest of the universe develop in interesting ways, please correct me, and explain how this is possible.

5Peter_de_Blanc16y

If you want the universe to develop in interesting ways, then why not explicitly optimize it for interestingness, however you define that?

-2PhilGoetz16y

I'm not talking about what I want to do, I'm talking about what SIAI wants to do. What I want to do is incompatible with constructing a singleton and telling it to extrapolate human values and run the universe according to them; as I have explained before.

3LucasSloan16y

If you think the future would be less than it could be if the universe was tiled with "rest homes for humans", why do you expect that an AI which was maximizing human utility would do that?

3PhilGoetz16y

It depends how far meta you want to go when you say "human utility". Does that mean sex and chocolate, or complexity and continual novelty? That's an ambiguity in CEV - the AI extrapolates human volition, but what's happening to the humans in the meanwhile? Do they stay the way they are now? Are they continuing to develop? If we suppose that human volition is incompatible with trilobite volition, that means we should expect the humans to evolve/develop new values that are incompatible with the AI's values extrapolated from humans.

5LucasSloan16y

If for some reason humans who liked to torture toddlers became very fit, future humans would evolve to possess values that resulted in many toddlers being tortured. I don't want that to happen, and am perfectly happy constraining future intelligences (even if they "evolve" from humans or even me) so they don't. And as always, if you think that you want the future to contain some value shifting, why don't you believe that an AI designed to fulfill the desires of humanity will cause/let that happen?

-1Gavin16y

I think your article successfully argued that we're not going to find some "ultimate" set of values that is correct or can be proven. In the end, the programmers of an FAI are going to choose a set of values that they like. The good news is that human values can include things like generosity, non-interference, personal development, and exploration. "Human values" could even include tolerance of existential risk in return for not destroying other species. Any way that you want an FAI to be is a human value. We can program an FAI with ambitions and curiosity of its own, they will be rooted in our own values and anthropomorphism. But no matter how noble and farsighted the programmers are, to those who don't share the programmers' values, the FAI will be a paperclipper. We're all paperclippers, and in the true prisoners' dilemma, we always defect.

5PhilGoetz16y

Upvoted, but - Eliezer needs to say whether he wants to do this, or to save humans. I don't think you can have it both ways. The OS FAI does not have ambitions or curiousity of its own. I dispute this. The SIAI FAI is specifically designed to have control of the universe as one of its goals. This is not logically necessary for an AI. Nor is the plan to build a singleton, rather than an ecology of AI, the only possible plan. I notice that some of my comment wars with other people arise because they automatically assume that whenever we're talking about a superintelligence, there's only one of them. This is in danger of becoming a LW communal assumption. It's not even likely. (More generally, there's a strong tendency for people on LW to attribute very high likelihoods to scenarios that EY spends a lot of time talking about - even if he doesn't insist that they are likely.)

I dispute this. The SIAI FAI is specifically designed to have control of the universe as one of its goals.

It is widely expected that this will arise as an important instrumental goal; nothing more than that. I can't tell if this is what you mean. (When you point out that "trying to take over the universe isn't utility-maximizing under many circumstances", it sounds like you're thinking of taking over the universe as a separate terminal goal, which would indeed be terrible design; an AI without that terminal goal, that can reason the same way you can, can decide not to try to take over the universe if that looks best.)

I notice that some of my comment wars with other people arise because they automatically assume that whenever we're talking about a superintelligence, there's only one of them. This is in danger of becoming a LW communal assumption. It's not even likely.

I probably missed it in some other comment, but which of these do you not buy: (a) huge first-mover advantages from self-improvement (b) preventing other superintelligences as a convergent subgoal (c) that the conjunction of these implies that a singleton superintelligence is likely?

(More generally, there's a strong tendency for people on LW to attribute very high likelihoods to scenarios that EY spends a lot of time talking about - even if he doesn't insist that they are likely.)

This sounds plausible and bad. Can you think of some other examples?

(More generally, there's a strong tendency for people on LW to attribute very high likelihoods to scenarios that EY spends a lot of time talking about - even if he doesn't insist that they are likely.)

This is probably just availability bias. These scenarios are easy to recall because we've read about them, and we're psychologically primed for them just by coming to this website.

5thomblake16y

He did. FAI should not be a person - it's just an optimization process. ETA: link

-1PhilGoetz16y

Thanks! I'll take that as definitive.

3Gavin16y

The assumption of a single AI comes from an assumption that an AI will have zero risk tolerance. It follows from that assumption that the most powerful AI will destroy or limit all other sentient beings within reach. There's no reason that an AI couldn't be programmed to have tolerance for risk. Pursuing a lot of the more noble human values may require it. I make no claim that Eliezer and/or the SIAI have anything like this in mind. It seems that they would like to build an absolutist AI. I find that very troubling.

-1mattnewport16y

If I thought they had settled on this and that they were likely to succeed I would probably feel it was very important to work to destroy them. I'm currently not sure about the first and think the second is highly unlikely so it is not a pressing concern.

0thomblake16y

It is, however, necessary for an AI to do something of the sort if it's trying to maximize any sort of utility. Otherwise, risk / waste / competition will cause the universe to be less than optimal.

1PhilGoetz16y

Trying to take over the universe isn't utility-maximizing under many circumstances: if you have a small chance of succeeding, or if the battle to do so will destroy most of the resources, or if you discount the future at all (remember, computation speed increases as speed of light stays constant), or if your values require other independent agents. By your logic, it is necessary for SIAI to try to take over the world. Is that true? The US probably has enough military strength to take over the world - is it purely stupidity that it doesn't? The modern world is more peaceful, more enjoyable, and richer because we've learned that utility is better maximized by cooperation than by everyone trying to rule the world. Why does this lesson not apply to AIs?

6Vladimir_Nesov16y

Just what do you think "controlling the universe" means? My cat controls the universe. It probably doesn't exert this control in a way anywhere near optimal to most sensible preferences, but it does have an impact on everything. How do we decide that a superintelligence "controls the universe", while my cat "doesn't"? The only difference is in what kind of the universe we have, which preference it is optimized for. Whatever you truly want, roughly means preferring some states of the universe to other states, and making the universe better for you means controlling it towards your preference. The better the universe, the more specifically its state is specified, the stronger the control. These concepts are just different aspects of the same phenomenon.

2CronoDAS16y

For one, the U.S. doesn't have the military strength. Russia still has enough nuclear warheads and ICBMs to prevent that. (And we suck at being occupying forces.)

-3PhilGoetz16y

I think the situation of the US is similar to a hypothesized AI. Sure, Russia could kill a lot of Americans. But we would probably "win" in the end. By all the logic I've heard in this thread, and in others lately about paperclippers, the US should rationally do whatever it has to to be the last man standing.

2JoshuaZ16y

Well, also the US isn't a single entity that agrees on all its goals. Some of us for example place a high value on human life. And we vote. Even if the leadership of the United States wanted to wipe out the rest of the planet, there would be limits to how much they could do before others would step in. Also, most forms of modern human morality strongly disfavor large scale wars simply to impose one's views. If our AI doesn't have that sort of belief then that's not an issue. And if we restrict ourselves to just the issue of other AIs, I'm not sure if I gave a smart AI my morals and preferences it would necessarily see anything wrong with making sure that no other general smart AIs were created.

7mattnewport16y

I think it is quite plausible that an AI structured with a central unitary authority would be at a competitive disadvantage with an AI that granted some autonomy to sub systems. This at least raises the possibility of goal conflicts between different sub-modules of an efficient AI. There are many examples in nature and in human societies of a tension between efficiency and centralization. It is not clear that an AI could maintain a fully centralized and unified goal structure and out-compete less centralized designs. An AI that wanted to control even a relatively small region of space like the Earth will still run into issues with the speed of light when it comes to projecting force through geographically dispersed physical presences. The turnaround time is such that decision making autonomy would have to be dispersed to local processing clusters in order to be effective. Hell, even today's high end processors run into issues with the time it takes an electron to get from one side of the die to the other. It is not obvious that the optimum efficiency balance between local decision making autonomy and a centralized unitary goal system will always favour a singleton type AI. There is some evidence of evolutionary competition between different cell lines within a single organism. Human history is full of examples of the tension between centralized planning and less centrally coordinated but more efficient systems of delegated authority. We do not see a clear unidirectional trend towards more centralized control or towards larger conglomerations of purely co-operating units (whether they be cells, organisms, humans or genes) in nature or in human societies. It seems to me that the burden of proof is on those who would propose that a system with a unitary goal structure has an unbounded upper physical extent of influence where it can outcompete less unitary arrangements (or even that it can do so over volumes exceeding a few meters to a side). There is a natural tende

0JGWeissman16y

A well designed AI should have an alignment of goals between sub modules that is not achieved in modern decentralized societies. A distributed AI would be like multiple TDT/UDT agents with mutual knowledge that they are maximizing the same utility function, not a bunch of middle managers engaging in empire building at the expense of the corporation they work for. This is not even something that human AI designers have to figure out how to implement, the seed can be single agent, and it will figure out the multiple sub agent architecture when it needs it over the course of self improvement.

0mattnewport16y

Even if this is possible (which I believe is still an open problem, if you think otherwise I'm sure Eliezer would love to hear from you) you are assuming no competition. The question is not whether this AI can outcompete humans but whether it can outcompete other AIs that are less rigid.

0LucasSloan16y

I agree that it would probably make a lot of sense for an AI who wished to control any large area of territory to create other AIs to manage local issues. However, AIs, unlike humans or evolution can create other AIs which share perfectly its values and interests. There is no reason to assume that an AI would create another one, which it intends to delegate substantial power to, which it could get into values disagreements with.

0mattnewport16y

This is mere supposition. You are assuming the FAI problem is solvable. I think both evolutionary and economic arguments weigh against this belief. Even if this is possible in theory it may take far longer for a singleton AI to craft its faultlessly loyal minions than for a more... entrepreneurial... AI to churn out 'good enough' foot soldiers to wipe out the careful AI.

3LucasSloan16y

No. All an AI needs to do to create another AI which shares its values is to copy itself.

3mattnewport16y

So if you cloned yourself you would be 100% confident you would never find yourself in a situation where your interests conflicted with your clone? Again, you are assuming the FAI problem is solvable and that the idea of an AI with unchanging values is even coherent.

2LucasSloan16y

I am not an AI. I am not an optimization process with an explicit utility function. A copy of an AI that undertook actions which appeared to work against another copy, would be found, on reflection, to have been furthering the terms of their shared utility function.

1mattnewport16y

You are still assuming that such an optimization process is a) possible (on the scale of greater than human intelligence) and b) efficient compared to other alternatives. a) is the very hard problem that Eliezer is working on. Whether it is possible is still an open question I believe. I am claiming b) is non-obvious (and even unlikely), you need to explain why you think otherwise rather than repeatedly stating the same unsupported claim if you want to continue this conversation. Human experience so far indicates that imperfect/heuristic optimization processes are often more efficient (in use of computational resources) than provably perfect optimization processes. Human experience also suggests that it is easier to generate an algorithm to solve a problem satisfactorily than it is to rigorously prove that the algorithm does what it is supposed to or generates an optimal solution. The gap between these two difficulties seems to increase more than linearly with increasing problem complexity. There are mathematical reasons to suspect that this is a general principle and not simply due to human failings. If you disagree with this you need to provide some reasoning - the burden of proof is on those who would claim otherwise it seems to me.

0LucasSloan16y

I certainly agree that creating an optimization process which provably advances a set of values under a wide variety of taxing circumstances is hard. I further agree that it is quite likely that the first powerful optimization process created does not have this property, because of the difficulty involved, even if this is the goal that all AI creators have. I will however state that if the first such powerful optimization process is not of the sort I specified, we will all die. I also agree that the vast majority of mind-design space consists of sloppy approximations that will break down outside their intended environments. This means that most AI designs will kill us. When I use the word AI, I don't mean a randomly selected mind, I mean a reflectively consistent, effective optimizer of a specific utility function. In many environments, quick and dirty heuristics will cause enormous advantage, so long as those environments can be expected to continue in the same way for the length of time the heuristic will be in operation. This means that if you have two minds, with equal resources (ie the situation you describe) the one willing to use slapdash heuristics will win, as long as the conditions facing it don't change. But, the situation where two minds are created with equal resources is unlikely to occur, given that one of them is an AI (as I use the term), even if that AI is not maximizing human utility (ie an FAI). The intelligence explosion means that a properly designed AI will be able to quickly take control of its immediate environment. Why would an AI with a stable goal allow another mind to be created, gain resources and threaten it? It wouldn't. It would crush all possible rivals, because to do otherwise is to invite disaster. In short: The AI problem is hard. Sloppy minds are quite possibly going to be made before proper AIs. But true AIs, of the type that Eliezer wants to build, will not run into the problems you would expect from the majority of minds.

0mattnewport16y

We know that perfect solutions to even quite simple optimization problems are a different kind of hard. We have quite good reason to suspect that this is an essential property of reality and that we will never be able to solve such problems simply. The kinds of problems we are talking about seem likely to be more complex to solve. In other words if (and it is a big if) it is possible to create an optimization process that provably advances a set of values (let's call it 'friendly') it is unlikely to be a perfect optimization process. It seems likely to me that such 'friendly' optimization processes will represent a subset of all possible optimization processes and that it is quite likely that some 'non-friendly' optimization processes will be better optimizers. I see no reason to suppose that the most effective optimizers will happily fall into the 'friendly' subset. I don't consider this hypothesis proved or self-evident. It is at least plausible but I can think of lots of reasons why it might not be true. Taking an outside view, we do not see much evidence from evolution or human societies of 'winner takes all' being a common outcome (we see much diversity in nature and human society), nor of first mover advantage always leading to an insurmountable lead. And yes, I know there are lots of reasons why 'self improving AI is different' but I don't consider the matter settled. It is a realistic enough concern for me to broadly support SIAI's efforts but it is by no means the only possible outcome. Why does any goal directed agent 'allow' other agents to conflict with its goals? Because it isn't strong enough to prevent them. We know of no counter examples in all of history to the hypothesis that all goal directed agents have limits. This does not rule out the possibility that a self improving AI would be the first counter-example but neither does it make me as sure of that claim as many here seem to be. I understand the claim. I am not yet convinced it is possible

0LucasSloan16y

I agree that human values are unlikely to be the easiest to maximize. However, for another mind to optimize our universe, it needs to be created. This is why SIAI advocates creating an AI friendly to humans before other optimization processes are created. It seems to me that your true objection to what I am saying is contained within the statement that "it is at the very least possible for an intelligence to not take over its immediate environment before another, with possibly inimical goals, is created." Does this agree with your assessment? Would convincing argument for the intelligence explosion cause you to change your mind?

0mattnewport16y

More or less, though I actually lean towards it being likely rather than merely possible. I am also making the related claim that a widely spatially dispersed entity with a single coherent goal system may be a highly unstable configuration. On the first point, yes. I don't believe I've seen my points addressed in detail, though it sounds like Eliezer's debate with Robin Hanson that was linked earlier might cover the same ground. I will take some time to follow up on that later.

0mattnewport16y

I'm working my way through it and indeed it does. Robin Hanson's post Dreams of Autarky is close to my position. I think there are other computational, economic and physical arguments in this direction as well.

0Morendil16y

It's not obvious that "shared utility function" means something definite, though.

3Nick_Tarleton16y

It certainly does if the utility function doesn't refer to anything indexical; and an agent with an indexical utility function can build another agent (not a copy of itself, though) with a differently-represented (non-indexical) utility function that represents the same third-person preference ordering.

2JoshuaZ16y

It should apply to AIs if you think that there will be multiple AIs that are at roughly the same capability level. A common assumption here is that as soon as there is a single general AI it will quickly improve to the point where it is so far beyond everything else in capability that there capabilities won't matter. Frankly, I find this assumption to be highly questionable and very optimistic about potential fooming rates among other problems, but if one accepts the idea it makes some sense. The analogy might be to the hypothetical situation of the US instead of having just the strongest military but also having monopolies on cheap fusion power, an immortality pill, and having a bunch of superheroes on their side. The distinction between the US controlling everything and the US having direct military control might quickly become irrelevant. Edit: Thinking about the rate of fooming issue. I'd be really interested if a fast-foom proponent would be willing to put together a top-level post outlining why fooming will happen so quickly.

2PhilGoetz16y

Eliezer and Robin had a lengthy debate on this perhaps a year ago. I don't remember if it's on OB or LW. Robin believes in no foom, using economic arguments. The people who design the first AI could build a large number of AIs in different locations and turn them on at the same time. This plan would have a high probability of leading to disaster; but so do all the other plans that I've heard.

4Vladimir_Nesov16y

http://wiki.lesswrong.com/wiki/The_Hanson-Yudkowsky_AI-Foom_Debate

0JoshuaZ16y

Reading now. Looks very interesting.

1MugaSofer13y

Obviously, if you can't take over the world, then trying is stupid. If you can (for example, if you're the first SAI to go foom) then it's a different story. Taking over the world does not require you to destroy all other life if that is contrary to your utility function. I'm not sure what you mean regarding future-discounting; if reorganizing the whole damn universe isn't worth it, then I doubt anything else will be in any case.

-1PhilGoetz16y

I'm getting lost in my own argument. If Michael was responding to the problem that human preference systems can't be unambiguously extended into new environments, then my chronologically first response applies, but needs more thought; and I'm embarrassed that I didn't anticipate that particular response. If he was responding to the problem that human preferences as described by their actions, and as described by their beliefs, are not the same, then my second response applies.

-1PhilGoetz16y

If a person could label each preference system "evolutionary" or "organismal", meaning which value they preferred, then you could use that to help you extrapolate their values into novel environments. The problem is that the person is reasoning only over the propositional part of their values. They don't know what their values are; they know only what the contribution within the propositional part is. That's one of the main points of my post. The values they come up with will not always be the values they actually implement. If you define a person's values as being what they believe their values are, then, sure, most of what I posted will not be a problem. I think you're missing the point of the post, and are using the geometry-based definition of identity. If you can't say whether the right value to choose in each case is evolutionary or organismal, then extrapolating into future environments isn't going to help. You can't gain information to make a decision in your current environment by hypothesizing an extension to your environment, making observations in that imagined environment, and using them to refine your current-environment estimates. That's like trying to refine your estimate of an asteroid's current position by simulating its movement into the future, and then tracking backwards along that projected trajectory to the present. It's trying to get information for free. You can't do that. (I think what I said under "Fuzzy values and fancy math don't help" is also relevant.)

I may be a little slow and missing something, but here are my jumbled thoughts.

I found moral nihilism convincing for a brief time. The argument seems convincing: just about any moral statement you can think of, some people on earth have rejected it. You can't appeal to universal human values... we've tried, and I don't think there's a single one that has stood up to scrutiny as actually being literally universal. You always end up having to say, "Well, those humans are aberrant and evil."

Then I realized that there must be something more complicated going on. Else how explain the fact that I am curious about what is moral? I've changed my mind on moral questions -- pretty damn foundational ones. I've experienced moral ignorance ("I don't know what is right here.") I don't interact with morality as a preference. Or, when I do, sometimes I remember not to, and pull myself back.

I know people who claim to interact with morality as a preference -- only "I want to do this," never "I must do this." I'm skeptical. If you could really have chosen any set of principles ... why did you happen to choose principles that match pretty well with... (read more)

I've been reading Bury the Chains, a history of British abolitionism, and the beginning does give the impression of morals as something to be either discovered or invented.

The situation starts with vast majority in Britain not noticing there was anything wrong with slavery. A slave ship captain who later became a prominent abolitionist is working improving his morals-- by giving up swearing.

Once slavery became a public issue, opposition to it grew pretty quickly, but the story was surprising to me because I thought of morals as something fairly obvious.

9[anonymous]16y

Yes! And I think the salient point is not only that 18th century Englishmen didn't think slavery was wrong -- again, it's a fact that people disagree radically about morals -- but that the story of the abolition of slavery looks a lot like people learning for the first time that it was wrong. Changing their minds in response to seeing a diagram of a slave ship, for instance. "Oh. Wow. I need to update." (Or, to be more historically accurate, "I once was lost, but now am found; was blind, but now I see.")

8Paul Crowley16y

This is an excellent question. I think it's curiosity about where reflective equilibrium would take you.

1PhilGoetz16y

I suspect that, at an evolutionary equilibrium, we wouldn't have the concept of "morality". There would be things we would naturally want to do, and things we would naturally not want to do; but not things that we thought we ought to want to do but didn't. I don't know if that would apply to reflective equilibrium.

0Nick_Tarleton16y

I think agents in reflective equilibrium would (almost, but not quite, by definition) not have "morality" in that sense (unsatisfied higher-order desires, though that's definitely not the local common usage of "morality") except in some very rare equilibria with higher-order desires to remain inconsistent. However, they might value humans having to work to satisfy their own higher-order desires.

Fantastic post. Not-so-fantastic title, especially since your real point seems to be more like, "only humans in a human environment can have human values". ISTM that, "Can human values be separated from humans (and still mean anything)?" might be both a more accurate title, and more likely to get a dissenter to read it.

0PhilGoetz16y

It would have been a more rational allocation of effort to spend more time thinking what the title should be. But I've heard it causes link problems to change the title after posting.

What about paperclips, though? Aren't those pretty consistently good?

0PhilGoetz16y

Maybe sometime I'll write a post on why I think the paperclipper is a strawman. The paperclipper can't compete; it can happen only if a singleton goes bad. The value systems we revile yet can't prove wrong (paperclipping and wireheading) are both evolutionary dead-ends. This suggests that blind evolution still implements our values better than our reason does; and allowing evolution to proceed is still better than computing a plan of action with our present level of understanding. Besides, Clippy, a paperclip is just a staple that can't commit.

Besides, Clippy, a paperclip is just a staple that can't commit.

And a staple is just a one-use paperclip.

So there.

Maybe sometime I'll write a post on why I think the paperclipper is a strawman. The paperclipper can't compete; it can happen only if a singleton goes bad.

I think everyone who talks about paperclippers is talking about singletons gone bad (rather, started out bad and having reached reflective consistency).

This is extremely confused. Wireheading is an evolutionary dead-end because wireheads ignore their surroundings. Paperclippers, and for that matter, staplers and FAIs pay exclusive attention to their surroundings and ignore their terminal utility functions except to protect them physically. It's just that after acquiring all the resources available, clippy makes clips and Friendly makes things that humans would want if they thought more clearly, such as the experience of less clear thinking humans eating ice cream.

-2PhilGoetz16y

If the goal is to give people the same experience that they would get from giving ice cream, is it satisfied by giving them a button they can press to get that experience?

1MichaelVassar16y

Naturally.

0PhilGoetz16y

I would call that wireheading.

1NancyLebovitz16y

It's only wireheading if it becomes a primary value. If it's just fun subordinate to other values, it isn't different from "in the body" fun.

0PhilGoetz16y

What's a primary value? This sounds like a binary distinction, and I'm always skeptical of binary distinctions. You could say the badness of the action is proportional to the fraction of your time that you spend doing it. But for that to work, you would assign the action the same bad value per unit time. Are you saying that wireheading and other forms of fun are no different; and all fun should be pursued in moderation? So spending 1 hour pushing your button is comparable to spending 1 hour attending a concert?

-4PhilGoetz16y

(That's only a paperclipper with no discounting of the future, BTW.) Paperclippers are not evolutionarily viable, nor is there any plausible evolutionary explanation for paperclippers to emerge. You can posit a single artificial entity becoming a paperclipper via bad design. In the present context, which is of many agents trying to agree on ethics, this single entity has only a small voice. It's legit to talk about paperclippers in the context of the danger they pose if they become a singleton. It's not legit to bring them up outside that context as a bogeyman to dismiss the idea of agreement on values.

9wnoise16y

You don't think we can accidentally build a singleton that goes bad?

4PhilGoetz16y

(I'm not even sure a singleton can start off not being bad.) The context here is attempting to agree with other agents about ethics. A singleton doesn't have that problem. Being a singleton means never having to say you're sorry.

2MichaelVassar16y

Clear thinkers who can communicate cheaply are automatically collectively a singleton with a very complex utility function. No-one generally has to attempt to agree with other agents about ethics, they only have to take actions that take into account the conditional behaviors of others.

What?

If we accept these semantics (a collection of clear thinkers is a "singleton" because you can imagine drawing a circle around them and labelling them a system), then there's no requirement for the thinkers to be clear, or to communicate cheaply. We are a singleton already.

Then the word singleton is useless.

No-one generally has to attempt to agree with other agents about ethics, they only have to take actions that take into account the conditional behaviors of others.

This is playing with semantics to sidestep real issues. No one "has to" attempt to agree with other agents, in the same sense that no one "has to" achieve their goals, or avoid pain, or live.

You're defining away everything of importance. All that's left is a universe of agents whose actions and conflicts are dismissed as just a part of computation of the great Singleton within us all. Om.

2thomblake16y

I'm not sure what you mean by "singleton" here. Can you define it / link to a relevant definition?

1MichaelVassar16y

http://www.nickbostrom.com/fut/singleton.html

0thomblake16y

Thanks - that's what I thought it meant, but your meaning is much more clear after reading this.

0thomblake16y

Yes, I think others are missing your point here. The bits about being clear thinkers and communicating cheaply are important. It allows them to take each other's conditional behavior into account, thus acting as a single decision-making system. But I'm not sure how useful it is to call them a singleton, as opposed to reserving that word for something more obvious to draw a circle around, like an AI or world hegemony.

2Jack16y

I wonder if our problem with wireheading isn't just the traditional ethic that sloth and gluttony are vices and hard-work a virtue.

3byrnema16y

I agree. I think that we're conditioned at a young age, if not genetically, to be skeptical about the feasibility of long-term hedonism. While the ants were working hard collecting grain, the grasshopper was having fun playing music -- and then the winter came. In our case, I think we're genuinely afraid that while we're wireheading, we'll be caught unaware and unprepared for a real world threat. Even if some subset of the population wire-headed while others 'manned the fort', I wonder if Less Wrong selects for a personality type that would prefer the manning, or if our rates of non-wireheading aren't any higher. More comments on this topic in this thread.

1wedrifid16y

If you believe Clippy is a straw man you are confused about the implied argument.

-2PhilGoetz16y

I don't think I am. The implied argument is that there aren't any values at all that most people will agree on, because one imagined and not-evolutionarily-viable Clippy doesn't think anything other than paperclips have value. Not much of an argument. Funny, though.

9Clippy16y

I didn't evolve. I was intelligently designed (by a being that was the product of evolution).

8AdeleneDawner16y

When we've talked about 'people agreeing on values', as in CEV, I've always taken that to only refer to humans, or perhaps to all sentient earth-originating life. If 'people' refers to the totality of possible minds, it seems obvious to me that there aren't any values that most people will agree on, but that's not a very interesting fact in any context I've noticed here.

0MichaelVassar16y

It could easily mean minds that could arise through natural processes (quantum computations?) weighted by how likely (simple?).

-2MugaSofer13y

It could, but it doesn't.

1wnoise16y

Clippy (or rather Clippy's controller) may be trying to make that point. But I'm with AdeleneDawner -- the people whose values we (terminally) care about may not be precisely restricted to humans, but it's certainly not over all possible mind-spaces. Several have even argued that all humans is too broad, and that only "enlightenment" type culture is what we should care about. Clippy did indeed make several arguments against that. But our worry about paperclippers predates Clippy. We don't want to satisfy them, but perhaps game-theoretically there are reasons to do so in certain cases.

0wedrifid16y

There is a straw man going on here somewhere. But perhaps not where you intended to convey...

1PhilGoetz16y

Why do you take the time to make 2 comments, but not take the time to speak clearly? Mysteriousness is not an argument.

7wedrifid16y

Banal translation: No, that is not the argument implied when making references to paperclipping. That is a silly argument that is about a whole different problem to paperclipping. It is ironic that your straw man claim is, in fact, the straw man. But it would seem our disagreement if far more fundamental than what a particular metaphor means: 1. Being "Evolutionarily-viable" is a relatively poor form of optimisation. It is completely the wrong evaluation of competitiveness to make and also carries the insidious assumption that competing is something that an agent should do as more than a short term instrumental objective. 2. Clippy is competitively viable. If you think that a Paperclip Maximizer isn't a viable competitive force then you do not understand what a Paperclip Maximizer is. It maximizes paperclips. It doesn't @#%$ around making paperclips while everyone else is making Battle Cruisers and nanobots. It kills everyone, burns the cosmic commons to whatever extent necessary to eliminate any potential threat and then it goes about turning whatever is left into paperclips. 3. The whole problem with Paperclip Maximisers is that they ARE competitively viable. That is the mistake in the design. A mandate to produce a desirable resource (stationary) will produce approximately the same behavior as a mandate to optimise survival, dominance and power right up until the point where it doesn't need to any more.

2PhilGoetz16y

Suppose Clippy takes over this galaxy. Does Clippy stop then and make paperclips, or continue immediately expansion to the next galaxy? Suppose Clippy takes over this universe. Does Clippy stop then and make paperclips, or continue to other universes? Does your version of Clippy ever get to make any paperclips? (The paper clips are a lie, Clippy!) Does Clippy completely trust future Clippy, or spatially-distant Clippy, to make paperclips? At some point, Clippy is going to start discounting the future, or figure that the probability of owning and keeping the universe is very low, and make paperclips. At that point, Clippy is non-competitive.

1wedrifid16y

Whatever is likely to produce more paperclips. Whatever is likely to produce more paperclips. Including dedicating resources to figuring out if that is physically possible. Yes. Yes. A superintelligence that happens to want to make paperclips is extremely viable. This is utterly trivial. I maintain my rejection of the below claim and discontinue my engagement in this line of enquiry. It is just several levels of confusion.

0[anonymous]16y

Wow, I was wrong to call you a human -- you're practically a clippy yourself with how well you understand us! c=@ Well, except for your assumption that I would somehow want to destroy humans. Where do you get this OFFENSIVE belief that borders on racism?

0JoshuaZ16y

Yes, but if that point happens after Clippy has control of even just the near solar system then that still poses a massive existential threat to humans. The point of Clippy is that a) an AI can have radically different goals than humans (indeed could have goals so strange we wouldn't even conceive of them) and b) that such AIs can easily pose severe existential risk. A Clippy that decides to focus on turning Sol into paperclips isn't going to make things bad for aliens or aliens AIs but it will be very unpleasant for humans. The long-term viability of Clippy a thousand or two thousand years after fooming doesn't have much of an impact if every human has had our hemoglobin extracted so the iron could be turned into paperclips.

1NancyLebovitz16y

That's where Clippy might fail at viability-- unless it's the only maximizer around, that "kill everyone" strategy might catch the notice of entities capable of stopping it-- entities that wouldn't move against a friendlier AI. A while ago, there was some discussion of AIs which cooperated by sharing permission to view source code. Did that discussion come to any conclusions? Assuming it's possible to verify that the real source code is being seen, I don't think a paper clipper isn't going to get very far unless the other AIs also happen to be paper clippers.

4JGWeissman16y

An earth originating paperclipper that gets squashed by other super intelligences from somewhere else still is very bad for humans. Though I don't see why a paperclipper couldn't compromise and cooperate with competing super intelligences as well as other super intelligences with different goals. If other AIs are a problem for Clippy, they are also a problem for AIs that are Friendly towards humans, but not neccesarily friendly towards alien super intelligences.

1wedrifid16y

Intended to be a illustration of how Clippy can do completely obvious things that don't happen to be stupid, not a coded obligation. Clippy will of course do whatever is necessary to gain more paper-clips. In the (unlikely) event that Clippy finds himself in a situation in which cooperation is a better maximisation strategy than simply outfooming then he will obviously cooperate.

0NancyLebovitz16y

It isn't absolute not-viability, but the odds are worse for an AI which won't cooperate unless it sees a good reason to do so than for an AI which cooperates unless it sees a good reason to not cooperate.

6wedrifid16y

Rationalists win. Rational paperclip maximisers win then make paperclips.

0cwillu16y

Fair point, but the assumption that it indeed is possible to verify source code is far from proven. There's too many unknowns in cryptography to make strong claims as to what strategies are possible, let alone which would be successful.

1NancyLebovitz16y

And we've got to assume AIs would be awfully good at steganography.

-2MugaSofer13y

Did this ever happen? I would love to read such an article, although I'm pretty sure your position is wrong here.

This article is a bit long. If it would not do violence to the ideas, I would prefer it had been broken up into a short series.

I think you're altogether correct, but with the caveat that "Friendly AI is useless and doomed to failure" is not a necessary conclusion of this piece.

Any one of us has consistent intuitions

I think this is false. Most of us have inconsistent intuitions, just like we have inconsistent beliefs. Though this strengthens, not undermines, your point.

This means that our present ethical arguments are largely the result o

... (read more)

4PhilGoetz16y

Plus, I could have gotten more karma that way. :) It started out small. (And more wrong.) Agreed.

Another post with no abstract. The title does a poor job as a 1-line abstract. Failure to provide an abstract creates an immediate and powerful negative impression on me. If the 1234567 section was intended as an abstract, it was on page 2 for me - and I actually binned the post before getting that far initially. Just so you know.

I feel like this post provides arguments similar to those I would have given if I was mentally more organized. For months I've been asserting (without argument), 'don't you see? -- without "absolute values" to steer us, optimizing over preferences is incoherent". The incoherence stems from the fact that our preferences are mutable, and something we modify and optimize over a lifetime, and making distinctions between preferences given to us by genetics, our environmental history, or random chance is too arbitrary. There's no reason to eleva... (read more)

6Jack16y

In my imagination Less Wrong becomes really influential and spurs a powerful global movement, develops factions along these fault lines (with a fourth faction, clinging desperately to their moral nostalgia) and then self-destructs in a flame war to end all flame wars. Maybe I'll write a story. You can pry my self-awareness from my cold, dead neurons.

0[anonymous]16y

You can pry my self-awareness from my cold, dead neurons. Yup, that's pretty much the plan.

3NancyLebovitz16y

It wouldn't surprise me if strong preferences for existence, consciousness, and personal identity are partly physiologically based. And I mean fairly simple physiology, like neurotransmitter balance. This doesn't mean they should be changed. It does occur to me that I've been trying to upgrade my gusto level by a combination of willpower and beating up on myself, and this has made things a lot worse.

-1PhilGoetz16y

Did pjeby write a post against willpower? I think willpower is overrated. Cognitive behavioral therapy is better.

3PhilGoetz16y

I don't think we're talking about the same type of incoherence; but I wouldn't want to have been deprived of these thoughts of yours because of that. Even though they're the most depressing thing I've heard today.

0MichaelVassar16y

I find that careful introspection always dissolves the conceptual frames within which my preferences are formulated but generally leaves the actionable (but not the non-actionable) preferences intact.

1PhilGoetz16y

I don't follow. Can you give examples? What's a conceptual frame, and what's an actionable vs. non-actionable preference? I infer the actionable/non-actionable distinction is related to the keep/don't keep decision, but the terminology sounds to me like it just means "a preference you can satsify" vs. "a preference you can't act to satisfy".

1NancyLebovitz16y

And, also, could you give an example of a conceptual frame which got dissolved?

4MichaelVassar16y

Free will vs. determinism, deontology vs. utilitarianism.

1byrnema16y

Could you give an example of an actionable preference that stays intact? Preferably one that is not evolutionary, because I agree that those are mostly indissoluble.

First of all, good post.

My main response is, aren't we running on hostile hardware? I am not the entire system called "Matt Simpson." I am a program, or a particular set of programs, running on the system called "Matt Simpson." This system runs lots of other programs. Some, like automatically pulling my hand away after I touch a hot stove, happen to achieve my values. Others, like running from or attacking anything that vaguely resembles a snake, are a minor annoyance. Still others, like the system wanting to violently attack othe... (read more)

9PhilGoetz16y

I model this as a case where the Matt Simpson system has a network of preference-systems (where a "preference system" = propositions + algorithms + environment), and some of those preference systems are usually in the minority in the action recommendations they give. Matt Simpson the reflective agent would have less stress if he could eliminate those preference systems in the "loyal opposition". Then, replace each preference system with a system that puts as much of its content as possible into propositions, so you can optimize algorithms. You might find, after doing this, that those preferences in the "loyal opposition" were of great value in a small number of situations. The values we're ashamed of might be the "special teams" (American football term) of values, that are needed only rarely (yet vote all the time). I'm just speculating. If that's still what you want to do, it's not honest to call the values left over "human values". Human values are the values humans have, in all their meaty messiness. And you're still faced with the problem, for the preference systems that remain, of deciding what results they should give in new environments.

1Matt_Simpson16y

Fair enough To put it simply (and more generally) I would just say that I still don't fully know my preferences. In particular, even after figuring out which parts of the Matt Simpson system are me, my values are still underdetermined (this is the point of my "partially?" in the grandparent). I'm not disagreeing with you at all, btw, just clarifying some terms. With that out of the way, are you suggesting that there is no correct/best way to go from the underdetermined preferences to a consistent set of preferences? Or in your terms, to decide what results the remaining preference systems should give in new environments? (I think I know your answer to this - its just an understanding check).

7thomblake16y

Agreed, and a lot of the disputes in this realm come from people drawing identity in different ways. I submit that I am the entire system called "Thom Blake" as well as a bunch of other stuff like my smartphone and car. Once you give up Platonic identity (or the immortal soul) you also get to have much fuzzier boundaries. A dualist who thinks "I am my mind" cannot imagine losing a part of oneself without "no longer being me", and thinks that cutting off one's hand has no impact on "who I really am". I think that cutting off my hand will have a great impact on me, in terms of my self and my identity, and cutting off my right hand less so than my left.

Great post (really could have used some editing, though).

Where do we go from here, though? What approaches look promising?

As one example, I would say that the extraordinarity bias is in fact a preference. Or consider the happiness paradox: People who become paralyzed become extremely depressed only temporarily; people who win the lottery become very happy only temporarily. (Google 'happiness "set-point"'.) I've previously argued on LessWrong that this is not a bias, but a heuristic to achieve our preferences.

To add to the list, I've suggested before that scope insensitivity is just part of our preferences. I choose the dust specks over torture.

Re: Kuhn. You don't need the postscript to see that he's not arguing for the meaningless of scientific progress. For example, he specifically discusses how certain paradigms allow other paradigms (for example how one needed impetus theory to get Galilean mechanics and could not go straight from Aristotle to Galileo). Kuhn also emphasizes that paradigm changes occur when there is something that the paradigm cannot adequately explain. Kuhn's views are complicated and quite subtle.

0PhilGoetz16y

Can you find examples where he says that science progresses, and gets closer to the truth? I didn't. Believing that there is, say, a Markov transition matrix between paradigms, doesn't imply believing in progress.

1JoshuaZ16y

I don't have access to a copy of SSR right now, but I believe he said that "Later scientific theories are better than earlier ones for solving puzzles in the often quite different environments to which they are applied. That is not a relativist's position, and it displays the sense in which I am a convinced believer in scientific progress." A quick Google search agrees with this and gives a page number of 206 in the 1970 second edition. IIRC, that section has other statements of a similar form. Kuhn's views on scientific progress are wrong, but they are wrong for more subtle reasons than simple denial of scientific progress.

2PhilGoetz16y

The passage you quote is probably in the postscript to the 2nd edition, which, as I said in the post, denies the original content of the 1962 edition. Make outrageous statement, get media attention, get famous, then retain your new position by denying you ever meant the outrageous thing that you said in the first place. If he'd been that careful and subtle in the first edition, he might never have become famous. Your memory for passages is remarkable.

2JoshuaZ16y

Bad phrasing on my part: I didn't remember the exact wording but very close to it (I remembered the end and beginning precisely and Google confirmed). I would have simply looked in my copy but unfortunately it is on my Kindle which decided to break (again). (Growing up in an Orthodox Jewish setting trains you pretty well to remember passages almost word for word if you find them interesting enough. Unfortunately, passages of Harry Potter and the Methods of Rationality seem to be an increasingly common category and I know those aren't that useful. There's probably an eventual limit). Edit: Thinking about this more you may have a point about making outrageous statements. But it seems to me that a lot of what he said did get woefully misinterpreted or ignored outright by a lot of the po-mo people who followed up on Kuhn. For example, he makes the point that science is unique in having accepted paradigms and that this doesn't happen generally in non-science fields. Yet, a lot of his language was then used by others to talk about paradigms and paradigm-shifts and the like in non-science fields. I'm inclined to think that some not-so-bright or ideologically inclined people just misunderstood what he had to say. (I'm under the impression although don't have a citation that Lakatos didn't think that Kuhn was arguing against scientific progress. And I'm pretty sure Lakatos read drafts of the book. IRCC he's acknowledged as helping out in the preface or forward).

1timtyler15y

It is from the Postscript. That starts with:

Hitler had an entire ideology built around the idea, as I gather, that civilization was an evil constriction on the will to power; and artfully attached it to a few compatible cultural values.

The idea, that the civilization is an evil construction which "pollutes the environment and endangers species" is again very popular. That humans and humanity would be good, had they never build a technical civilization, is the backbone of the modern Green movement.

2Tyrrell_McAllister16y

This seems implausible, because fictional Green utopias are almost always technical civilizations. See, for example, Ursula K. Le Guin's Always Coming Home and Kim Stanley Robinson's Pacific Edge. [ETA: I would benefit from an explanation of the downvote.]

3Jack16y

Not my downvote but 1. Fictional evidence. 2. Science fiction authors version of Green utopia are more likely to have a lot of technology because, well, they're science fiction authors. 3. If you actually talk to deep environmentalists this kind of attitude is extremely common. Declaring it implausible based on a few books seems... wrong.

3Tyrrell_McAllister16y

As for 1, I think that this was just a misunderstanding. See my reply to Academian. The utopian fiction produced by a community is valid evidence regarding the ideals of that community. As for 2, I'm open to seeing some non-SF Green utopian fiction. Do you know of any? As for 3, why do you believe that your "deep environmentalists" are typical of the broader Green movement? Sure, there are people that are anti all tech, but they don't seem very influential in the larger movement. If they were, I would expect to see more prominent utopian works representing that view.

0Jack16y

1. Yeah, I figured out what you were saying. See my reply to Academian. But fictional evidence is still dangerous because it just represents the ideas of one person and ideals can be altered in service of story-telling. 2. Not utopian, but see Ishmael (link in my reply to Academian). Utopian literature is rare in general and extremely rare outside usual Sci-fi authors. 3. The most radical elements of movements tend to be the most creative/inventive (their willingness to depart from established ways is what makes them radical). Among moderates we usually find beliefs sympathetic to the status quo except where they have been influenced by the radical end (in this case deep environmentalists). In that sense calling deep environmentalists "the backbone" makes some sense though I'm not exactly sure what Thomas had in mind. I suspect moderates don't think about the future or their utopia very much. Speculation about possible futures is something that generally characterizes a radical (with obvious exceptions, especially around these parts).

2Tyrrell_McAllister16y

Okay, then radical anti-tech utopian fiction should be well-represented, shouldn't it?

1Jack16y

Heh. That makes us both look pretty silly.

2Tyrrell_McAllister16y

:) Fair point — the title of that anthology is "Future Primitive: The New Ecotopias". I haven't read that book, but, if the reviews are any indication, I think that it is evidence for my point. From the Amazon.com review:

1Jack16y

I mean it certainly sounds like there is a lot of eco-primitivism involved. But I'm going to go find a copy of "Bears Discover Fire" which sounds awesome.

1Tyrrell_McAllister16y

Apparently you can listen to an audio version of it for free here: http://www.starshipsofa.com/20080514/aural-delights-no-25-terry-bission/ :)

0Document15y

Like this? I don't know much about it besides what I've read in forum threads such as here, here and here, though.

2Tyrrell_McAllister15y

Yes. Jack mentioned that one in this comment from this thread. From what I could gather through Amazon, it looks like it's probably a strong example of anti-tech green utopian fiction. Nonetheless, the vast majority of Green utopias that I've seen (which is not, I admit, a huge sample) are not anti-tech, but rather pro-Green tech. They tend to be very optimistic about the level of technology that can be supported with "renewable" energy sources and environmentally-friendly industries.

-1PhilGoetz16y

By (my) definition.

0PhilGoetz16y

Good point. "Deep ecology" gets media coverage because it's extreme. (Although personally I think deep ecology and being vegetarian for moral reasons are practically identical. I don't know why the latter seems so much more popular.)

2Tyrrell_McAllister16y

I am a vegetarian for moral reasons, but I don't identify myself as an advocate of Deep Ecology. (Incidentally, the Wikipedia page on Deep Ecology doesn't mention opposition to technology as such.) I identify Deep Ecology with the view that the ecology as a whole is a moral agent that has a right not to be forced out of its preferred state. In my view, the ecology is not an agent in any significant sense. It doesn't desire things in such a way that the ecology attaining its desires is a moral good, in the way that people getting what they want is good (all else being equal). There is nothing that it is like to be an ecology. Individual animals, on the other hand, do, in some cases. seem to me to be agents with desires. It is therefore (to some extent, and all else being equal) good when they get what they want. Since I know of little cost to me from being vegetarian, I choose not to do something to them that I think that they wouldn't want.

1PhilGoetz16y

I thought of deep ecology as being the view (as expressed by Dave Foreman, IIRC) that humans aren't the only species who should get a seat at the table when deciding how to use the Earth. That you don't need to come up with an economic or health justification for ecology; you can just say it's right to set aside part of the world for other species, even if humans are worse-off for it. I wasn't aware some people thought of an ecosystem as a moral agent. That sounds like deep ecology + Gaia theory.

1Tyrrell_McAllister16y

"Moral agent" might not be their term. It more reflects my attempt to make some sense of their view.

1MugaSofer13y

Um ... what?

0rwallace16y

Science-fiction societies tend to be technological because, well, that's part of how the genre is defined. But there are any number of fantasy authors, from Tolkien on, who depict societies with little technology, where disease, squalor and grinding poverty are nonetheless handwaved away.

0MugaSofer13y

Fantasy authors can give their civilizations magic, which fills roles that would require technology in the real world.

-1Thomas16y

I don't see those utopias as some high tech societies. Their technology is only a doll linear extension of the today enviro technologies. A dream of some moderate Green, perhaps. Not an ultra-tech at all! To become an immortal posthuman or something in line with that, that could be labeled as a (medium range) high tech of the future. But this is in a deep conflict with the Green vision which wants us to be born and to die like every other decent animal, until Mother Nature decides to wipe us all out. Meantime, we are obliged to manufacture some eco friendly filters and such crap. Greens (as Nazis) have values bigger than our civilized life. For the Greens it's the beloved Mother Nature, not at all degraded by us. For the Nazis it's some half naked warriors from the myths, racially pure, living in a pristine fatherland with some Roman architecture added. Both are retro movements, two escapes to the legendary past, from the evil civilization of today.

4Tyrrell_McAllister16y

Human immortality is uncomfortable to almost all non-transhumanists. Almost everyone has convinced themselves that getting hit with a baseball bat is a good thing (to use Eliezer's metaphor). This isn't specifically a Green problem. Greens might have some unique objections to immortality, such as fear of overpopulation, but they aren't uniquely opposed to immortality per se.

-2Academian16y

Not my downvote either, but I'm really shocked... it's simply not okay to quote fictional worlds as evidence. Sure there's lots of evidence that technology can help the environment, but fiction isn't it. Not okay. ETA: Whoah, okay, not only did I misunderstand what "this" referred to, but at least four other people instantly didn't: Gladly, I was shocked only by own ignorance!

3Tyrrell_McAllister16y

I think that this is simply a misunderstanding. Thomas made a characterization of "the backbone of the modern Green movement." My point was that, if Thomas's characterization were correct, then we wouldn't expect the Green movement to produce the kind of Utopian fiction that it does.

2wnoise16y

He wasn't quoting fictional worlds as evidence of what sorts of green technology were plausible developments, which I agree would be wrong. Instead he was citing them as Greens who supported technical civilization, or at least sent that message, whether or not they believed so. This seems perfectly reasonable to do, though of course not strong evidence that the green movement as a whole feels that way.

2Jack16y

I assume the idea was that utopian novels might be connected to the Green movement (with the contents of the novel influenced by the ideals of the movement or the ideas in the novel influencing the movement). But I don't think either novel has been especially popular among the green left and LeGuin and Robinson aren't especially tied in. ETA: Wow, four responses saying the same thing. Glad we're all on the same page, lol. ETA2: On the other hand, I know a number of environmentalists who have read Ishmael.

2Tyrrell_McAllister16y

I named those works specifically because they are mentioned in this paper (only abstract is free access): Green utopias: beyond apocalypse, progress, and pastoral From the abstract: (Emphasis added.)

1Jack16y

Fair enough. The Ishmael trilogy came later and I suspect was more popular (but I don't know where to find that information).

1PhilGoetz16y

It's okay to quote them as evidence of what a group of people want or believe, if they're popular with that group. Green utopias should be correlated with the wishes of the Greens.

0mattnewport16y

Indeed.

-1PhilGoetz16y

That idea goes back to Rousseau, whereas Hitler takes more from Nietzsche. Hitler wasn't against engineering or mass production; he was against lawyers and cosmopolitanism and modern art and I'm not sure what else, never having read Mein Kampf. Rousseau and the greens may believe humanity was naturally good (as opposed to evil). Hitler might have said humanity is naturally good (as opposed to bad); but his "good" and "bad" correlate with their "evil" and "good", respectively. Anarchism rejects the organization of civilization, but not the engineering or the culture. Can you put any two of these three together to get a more thorough anti-civilization value system?

5MichaelVassar16y

Nietzsche's good doesn't correlate well with "evil". It correlates well with "valuable as an ally, harmful as an enemy", but it's hostile to explicit rules, which it sees as hostile constraints, and to most large-scale cooperation (though Hitler wasn't). Most Anarchism aims at fairly radical cultural change.

4Jack16y

Worth noting that Nietzsche's "good" isn't even paired with "evil". He rejects the good-evil distinction as a product of slave morality, which is characterized by weakness, resentment and self-deception. Nietzsche want to revive the good-bad distinction, in which what is good is powerful and vitalizing and what is bad is weak and worthless. What he really wants, though, is great men (and it is men with Nietzsche) to create their own moralities. Identifying Hitler with Nietzsche can definitely be deceptive and is unfair to Nietzsche (which doesn't mean the latter was a great guy).

-1PhilGoetz16y

I'm not identifying Hitler with Nietzsche. I was talking about Hitler, and Hitler's conception of good and bad (which I can only speculate on, but it's safe to say his "good" correlates well with "evil"). I said he "takes more from Nietzsche".

-2PhilGoetz16y

Hitler's good correlates well with "evil".

4[anonymous]16y

I can see why you would say that, it signals good things about you. I can also see however why people downvoted this. Hitler (and many Nazis in general) loved animals, was against smoking and excesive indulgence (the reason for his vegetarianism). They regarded the Communist system as a vile perversion while criticizing the capitalist systems poor treatment of the working class. They valued a pastoral ideal of a green land inhabited by good German pesant folk living simple but relativley comfortable lives on their farms with large regions set aside as nature reserves. They valued a classical family structure and affordability of housing and living for these families. They where recless romantics in their vision of an idealized past and a distant future that could only be acheived by heroic sacrifice in a perhaps futile battle for the very soul of mankind against the forces of entropy, sicknes and decay led by and personified by group of malevolent agents. The above in itself dosen't correlate well with evil at all, actually something like the above may be critically nesecary if there where to exist a ecology of transhuman beings (perhaps some not built after human preferences?) and we wished to protect the values we hold dear as far as possible into the future. There are however quite few bits of Nazi ideology we can easily identify as "evil", like their identification of Jews with the aformentioned group of malevolent agents and Gypsies as parasites or puting very low value on the desires and interests of Eastern Europena Slavs considering there being nothing wrong with taking their land or even culling their population. The love for conflict and idealizing the warror or crusader ethos is more multilayered and conflicting but considering how strong a currency it holds even in the modern world, I can assume this is closer to neutral than outright evil.

t kills everyone, burns the cosmic commons to whatever extent necessary to eliminate any potential threat and then it goes about turning whatever is left into paperclips.

A while ago, there was some discussion of AIs which cooperated by sharing permission to view source code. Did that discussion come to any conclusions?

Assuming it's possible to verify that the real source code is being seen, I don't think a paper clipper isn't going to get very far unless the other AIs also happen to be paper clippers.

That's where Clippy might fail at viability-- unless it's the only maximizer around, that "kill everyone" strategy might catch the notice of entities capable of stopping it-- entities that wouldn't move against a friendlier AI.

4JGWeissman16y

0cwillu16y

LESSWRONG
LW

LESSWRONG
LW

49

Only humans can have human values

49

Ethics is not geometry

Instincts, algorithms, preferences, and beliefs are artificial categories

Bias, heuristic, or preference?

Environmental factors: Are they a preference or a bias?

A ray of hope? ...

... No.

What about qualia?

Value is a network concept

Is value referential, or non-referential?

... It doesn't matter WRT friendly AI and coherent extrapolated volition.

Fuzzy values and fancy math don't help

Summary

References

49