On the Galactic Zoo hypothesis
Recently, I was reading some arguments about Fermi paradox and aliens and so on; also there was an opinion among the lines of "humans are monsters and any sane civilization avoids them, that's why Galactic Zoo". As implausible as it is, but I've found one more or less sane scenario where it might be true.
Assume that intelligence doesn't always imply consciousness, and assume that evolution processes are more likely to yield intelligent, but unconscious life forms, rather than intelligent and conscious. For example, if consciousness is resource-consuming and otherwise almost useless (as in Blindsight).
Now imagine that all the alien species evolved without consciousness. Being an important coordination tool, their moral system takes that into account -- it relies on a trait that they have -- intelligence, rather than consciousness. For example, they consider destroying anything capable of performing complex computations immoral.
Then human morality system would be completely blind to them. Killing such an alien would be no more immoral, then, say, recycling a computer. So, for these aliens, human race would be indeed monstrous.
The aliens consider extermination of an entire civilization immoral, since that would imply destroying a few billions of devices, capable of performing complex enough computations. So they decide to use their advanced technology to render their civilizations invisible for human scientists.
Moral AI: Options
Epistemic status: One part quotes (informative, accurate), one part speculation (not so accurate).
One avenue towards AI safety is the construction of "moral AI" that is good at solving the problem of human preferences and values. Five FLI grants have recently been funded that pursue different lines of research on this problem.
The projects, in alphabetical order:
Most contemporary AI systems base their decisions solely on consequences, whereas humans also consider other morally relevant factors, including rights (such as privacy), roles (such as in families), past actions (such as promises), motives and intentions, and so on. Our goal is to build these additional morally relevant features into an AI system. We will identify morally relevant features by reviewing theories in moral philosophy, conducting surveys in moral psychology, and using machine learning to locate factors that affect human moral judgments. We will use and extend game theory and social choice theory to determine how to make these features more precise, how to weigh conflicting features against each other, and how to build these features into an AI system. We hope that eventually this work will lead to highly advanced AI systems that are capable of making moral judgments and acting on them.
Techniques: Top-down design, game theory, moral philosophy
Previous work in economics and AI has developed mathematical models of preferences, along with algorithms for inferring preferences from observed actions. [Citation of inverse reinforcement learning] We would like to use such algorithms to enable AI systems to learn human preferences from observed actions. However, these algorithms typically assume that agents take actions that maximize expected utility given their preferences. This assumption of optimality is false for humans in real-world domains. Optimal sequential planning is intractable in complex environments and humans perform very rough approximations. Humans often don't know the causal structure of their environment (in contrast to MDP models). Humans are also subject to dynamic inconsistencies, as observed in procrastination, addiction and in impulsive behavior. Our project seeks to develop algorithms that learn human preferences from data despite the suboptimality of humans and the behavioral biases that influence human choice. We will test our algorithms on real-world data and compare their inferences to people's own judgments about their preferences. We will also investigate the theoretical question of whether this approach could enable an AI to learn the entirety of human values.
Techniques: Trying to find something better than inverse reinforcement learning, supervised learning from preference judgments
The future will see autonomous agents acting in the same environment as humans, in areas as diverse as driving, assistive technology, and health care. In this scenario, collective decision making will be the norm. We will study the embedding of safety constraints, moral values, and ethical principles in agents, within the context of hybrid human/agents collective decision making. We will do that by adapting current logic-based modelling and reasoning frameworks, such as soft constraints, CP-nets, and constraint-based scheduling under uncertainty. For ethical principles, we will use constraints specifying the basic ethical ``laws'', plus sophisticated prioritised and possibly context-dependent constraints over possible actions, equipped with a conflict resolution engine. To avoid reckless behavior in the face of uncertainty, we will bound the risk of violating these ethical laws. We will also replace preference aggregation with an appropriately developed constraint/value/ethics/preference fusion, an operation designed to ensure that agents' preferences are consistent with the system's safety constraints, the agents' moral values, and the ethical principles of both individual agents and the collective decision making system. We will also develop approaches to learn ethical principles for artificial intelligent agents, as well as predict possible ethical violations.
Techniques: Top-down design, obeying ethical principles/laws, learning ethical principles
The objectives of the proposed research are (1) to create a mathematical framework in which fundamental questions of value alignment can be investigated; (2) to develop and experiment with methods for aligning the values of a machine (whether explicitly or implicitly represented) with those of humans; (3) to understand the relationships among the degree of value alignment, the decision-making capability of the machine, and the potential loss to the human; and (4) to understand in particular the implications of the computational limitations of humans and machines for value alignment. The core of our technical approach will be a cooperative, game-theoretic extension of inverse reinforcement learning, allowing for the different action spaces of humans and machines and the varying motivations of humans; the concepts of rational metareasoning and bounded optimality will inform our investigation of the effects of computational limitations.
Techniques: Trying to find something better than inverse reinforcement learning (differently this time), creating a mathematical framework, whatever rational metareasoning is
Autonomous AI systems will need to understand human values in order to respect them. This requires having similar concepts as humans do. We will research whether AI systems can be made to learn their concepts in the same way as humans learn theirs. Both human concepts and the representations of deep learning models seem to involve a hierarchical structure, among other similarities. For this reason, we will attempt to apply existing deep learning methodologies for learning what we call moral concepts, concepts through which moral values are defined. In addition, we will investigate the extent to which reinforcement learning affects the development of our concepts and values.
Techniques: Trying to identify learned moral concepts, unsupervised learning
The elephant in the room is that making judgments that always respect human preferences is nearly FAI-complete. Application of human ethics is dependent on human preferences in general, which are dependent on a model of the world and how actions impact it. Calling an action ethical also can also depend on the space of possible actions, requiring a good judgment-maker to be capable of search for good actions. Any "moral AI" we build with our current understanding is going to have to be limited and/or unsatisfactory.
Limitations might be things like judging which of two actions is "more correct" rather than finding correct actions, only taking input in terms of one paragraph-worth of words, or only producing good outputs for situations similar to some combination of trained situations.
Two of the proposals are centered on top-down construction of a system for making ethical judgments. Designing a system by hand, it's nigh-impossible to capture the subtleties of human values. Relatedly, it seems weak at generalization to novel situations, unless the specific sort of generalization has been forseen and covered. The good points of a top down approach are that it can capture things that are important, but are only a small part of the description, or are not easily identified by statistical properties. A top-down model of ethics might be used as a fail-safe, sometimes noticing when something undesirable is happening, or as a starting point for a richer learned model of human preferences.
Other proposals are inspired by inverse reinforcement learning. Inverse reinforcement learning seems like the sort of thing we want - it observes actions and infers preferences - but it's very limited. The problem of having to know a very good model of the world in order to be good at human preferences rears its head here. There are also likely unforseen technical problems in ensuring that the thing it learns is actually human preferences (rather than human foibles, or irrelevant patterns) - though this is, in part, why this research should be carried out now.
Some proposals want to take advantage of learning using neural networks, trained on peoples' actions or judgments. This sort of approach is very good at discovering patterns, but not so good at treating patterns as a consequence of underlying structure. Such a learner might be useful as a heuristic, or as a way to fill in a more complicated, specialized architecture. For this approach like the others, it seems important to make the most progress toward learning human values in a way that doesn't require a very good model of the world.
[POLL] LessWrong group on YourMorals.org (2015)
The regular research has had interesting results like showing a distinct pattern of cognitive traits and values associated with libertarian politics, but there's no reason one can't use it for investigating LWers in more detail; for example, going through the results, "we can see that many of us consider purity/respect to be far less morally significant than most", and we collectively seem to have Conscientiousness issues. (I also drew on it recently for a gay marriage comment.) If there were more data, it might be interesting to look at the results and see where LWers diverge the most from libertarians (the mainstream group we seem most psychologically similar to), but unfortunately for a lot of the tests, there's too little to bother with (LW n<10). Maybe more people could take it.
Big 5: http://www.yourmorals.org/bigfive_process.php
(You can see some of my results at http://www.gwern.net/Links#profile )
AI-created pseudo-deontology
I'm soon going to go on a two day "AI control retreat", when I'll be without internet or family or any contact, just a few books and thinking about AI control. In the meantime, here is one idea I found along the way.
We often prefer leaders to follow deontological rules, because these are harder to manipulate by those whose interests don't align with ours (you could say the similar things about frequentist statistics versus Bayesian ones).
What about if we applied the same idea to AI control? Not giving the AI deontological restrictions, but programming with a similart goal: to prevent a misalignment of values to be disastrous. But who could do this? Well, another AI.
My rough idea goes something like this:
AI A is tasked with maximising utility function u - a utility function which, crucially, it doesn't know yet. Its sole task is to create AI B, which will be given a utility function v and act on it.
What will v be? Well, I was thinking of taking u and adding some noise - nasty noise. By nasty noise I mean v=u+w, not v=max(u,w). In the first case, you could maximise v while sacrificing u completely, it w is suitable. In fact, I was thinking of adding an agent C (which need not actually exist). It would be motivated to maximise -u, and it would have the code of B and the set of u+noise, and would choose v to be the worst possible option (form the perspective of a u-maximiser) in this set.
So agent A, which doesn't know u, is motivated to design B so that it follows its motivation to some extent, but not to extreme amounts - not in ways that might sacrifice some of the values of some sub-part of its utility function, because that might be part of the original u.
Do people feel this idea is implementable/improvable?
The morality of disclosing salary requirements
Many firms require job applicants to tell them either how much money they're making at their current jobs, or how much they want to make at the job they're interviewing for. This is becoming more common, as more companies use web application forms that refuse to accept an application until the "current salary" or "salary requirements" box is filled in with a number.
The Arguments
I've spoken with HR people about this, and they always say that they're just trying to save time by avoiding interviewing people who want more money than they can afford.
CEV: coherence versus extrapolation
It's just struck me that there might be a tension between the coherence (C) and the extrapolated (E) part of CEV. One reason that CEV might work is that the mindspace of humanity isn't that large - humans are pretty close to each other, in comparison to the space of possible minds. But this is far more true in every day decisions than in large scale ones.
Take a fundamentalist Christian, a total utilitarian, a strong Marxist, an extreme libertarian, and a couple more stereotypes that fit your fancy. What can their ideology tell us about their everyday activities? Well, very little. Those people could be rude, polite, arrogant, compassionate, etc... and their ideology is a very weak indication of that. Different ideologies and moral systems seem to mandate almost identical everyday and personal interactions (this is in itself very interesting, and causes me to see many systems of moralities as formal justifications of what people/society find "moral" anyway).
But now let's more to a more distant - "far" - level. How will these people vote in elections? Will they donate to charity, and if so, which ones? If they were given power (via wealth or position in some political or other organisation), how are they likely to use that power? Now their ideology is much more informative. Though it's not fully determinative, we would start to question the label if their actions at this level seemed out of synch. A Marxist that donated to a Conservative party, for instance, would give us pause, and we'd want to understand the apparent contradiction.
Let's move up yet another level. How would they design or change the universe if they had complete power? What is their ideal plan for the long term? At this level, we're entirely in far mode, and we would expect that their vastly divergent ideologies would be the most informative piece of information about their moral preferences. Details about their character and personalities, which loomed so large at the everyday level, will now be of far lesser relevance. This is because their large scale ideals are not tempered by reality and by human interactions, but exist in a pristine state in their minds, changing little if at all. And in almost every case, the world they imagine as their paradise will be literal hell for the others (and quite possibly for themselves).
To summarise: the human mindspace is much narrower in near mode than in far mode.
And what about CEV? Well, CEV is what we would be "if we knew more, thought faster, were more the people we wished we were, had grown up farther together". The "were more the people we wished we were" is going to be dominated by the highly divergent far mode thinking. The "had grown up farther together" clause attempts to mesh these divergences, but that simply obscures the difficulty involved. The more we extrapolate, the harder coherence becomes.
It strikes me that there is a strong order-of-operations issue here. I'm not a fan of CEV, but it seems it would be much better to construct, first, the coherent volition of humanity, and only then to extrapolate it.
Omission vs commission and conservation of expected moral evidence
Consequentialism traditionally doesn't distinguish between acts of commission or acts of omission. Not flipping the lever to the left is equivalent with flipping it to the right.
But there seems one clear case where the distinction is important. Consider a moral learning agent. It must act in accordance with human morality and desires, which it is currently unclear about.
For example, it may consider whether to forcibly wirehead everyone. If it does so, they everyone will agree, for the rest of their existence, that the wireheading was the right thing to do. Therefore across the whole future span of human preferences, humans agree that wireheading was correct, apart from a very brief period of objection in the immediate future. Given that human preferences are known to be inconsistent, this seems to imply that forcible wireheading is the right thing to do (if you happen to personally approve of forcible wireheading, replace that example with some other forcible rewriting of human preferences).
What went wrong there? Well, this doesn't respect "conversation of moral evidence": the AI got the moral values it wanted, but only though the actions it took. This is very close to the omission/commission distinction. We'd want the AI to not take actions (commission) that determines the (expectation of the) moral evidence it gets. Instead, we'd want the moral evidence to accrue "naturally", without interference and manipulation from the AI (omission).
Utilitarianism and Relativity Realism
Introduction
Most people on less wrong seem to be some kind of hedonic consequentialist. They think states with less suffering and more joy are better. Moreover, it is intuitive that if you can cause some improvement in human well-being to be achieved then (other things being equal) it is better to realize that improvement as soon as possible. Also, most people on this site seem to be realists about special relativity. That is they assume that any inertial reference frame is an equally valid point from which to describe reality rather than believing there is one true reference which offers a preferred description of reality. I will point out that these beliefs (plus some innocuous assumptions) lead quickly to paradox.
Relativity Realism
Before I continue I want to point out that empirical observations really are agnostic about the existence of a preferred reference frame. Indeed, it's a consequence of the theory of relativity itself that it's predictions are equally well explained by postulating a single true inertial reference frame and simply using the Lorentz contraction and time dilation equations to compute behavior for all moving objects. To see that this must be true not that if we take relativity seriously the laws of physics must work correctly in any reference frame. In particular, if we imagine designating one reference frame to be the true reference frame then, relativity itself, tells us that applying the laws of physics in that reference frame has to give us the correct results.
In other words once we accept Einstein's equations for length contraction and time dilation with velocity we can interpret those equations as either undermining the idea of a fixed ether against which objects move (any reference frame is equally valid) or that there really is a fixed ether but objects in motion behave in such a manner that we can't empirically distinguish what is at rest.
At first blush this second result seems so jury rigged that surely the simpler assumption is that there is no preferred reference frame. This relies on a false description of the situation. The question isn't, "do we assign a low prior probability to the laws of physics conspiring to hide the true rest frame from us?" Presumably we do. The question should be, "given that the laws of physics do conspire to make a special rest frame empirically indistinguishable from any other inertial frames what probability do we assign to such a frame existing?" After all it is a mathematical truth that the time dilation and length contraction do perfectly conspire to prevent us from measuring motion relative to some true rest frame (if it existed) so in deciding whether to believe in a preferred rest frame we aren't deciding between laws that would and wouldn't hide such a frame from us. We are only deciding whether, given we have such laws, whether we think such an undetectable true rest frame exists.
To make it even more plausible that there is some true rest frame I will remark (but not argue) that relativity is a pretty general phenomena that can be derived from any model that conserves momentum, where the forces obey the inverse square law and all propagate at a constant speed relative to some fixed background, matter is held together in equilibrium states of these forces and time is implicitly measured via the rate it takes these forces to propagate. In other words if you have atoms held together by EM forces and the time it takes physical processes to happen is governed by the time it takes either forces or matter to cross certain distances then relativity comes for free. So it isn't amazing that we might have a true prefered reference frame and yet it be impossible to experimentally determine that frame.
(As an aside this interpretation of relativity, fully consistent with all observables so far, makes for much better scifi since FTL travel doesn't allow anyone to go back in time).
A Paradox Resulting From Relativity Realism
Suppose we have two different brain implants that will be implanted in two different conscious but coma bound individuals. After a delay of 10 minutes after implantation the first device delivers an instantaneous burst of euphoria every second. The other delivers an instantaneous burst of discomfort every second. I assume we would all agree that (with sufficient additional assumptions) the world is a better place if we implant just a device of the euphoria inducing kind and a worse place if we just implant a device of the second kind. So assume the devices are appropriately calibrated so that the effect of implanting both is neutral (or very very nearly so). So far so good.
I think we can all agree that the world would be better off if we delayed implanting the discomforting device by 10 minutes (or equivalently implanted the pleasurable device 10 minutes earlier). If you dispute this conclusion then you get absurd results if you even admit the possibility of a universe that exists forever as in such a universe it is no better to permanently increase human welfare now than to delay that increase by 10 minutes or 10 centuries.
Now assume that the two individuals receiving the transplants are actually on spaceships moving in opposite directions at high rates of speed and the implantation is done at the instant they pass by each other. For simplicity we assume everyone else dies at this instant (or add an irrelevance of identical outcomes assumption and note that the two ships are moving at the same velocity relative to everyone else).
From the reference frame of the individual who received the beneficial implant we can analyze the situation as follows. Without loss of generality we can assume the ships are traveling at an appropriate speed so that for every second that pases in our reference frame only 1/2 a second passes on the other ship. Thus in this reference frame the first experience of discomfort is delayed by 10 minutes and then only occurs every other second. Now surely the world is no worse off because the discomfort occurs less frequently. But ignoring the fact that the discomforting device fires less frequently this is exactly equivalent to implanting the desirable device 10 minutes before the undesirable one. Thus, since implanting both in the same reference frame was neutral, it is actually favorable (better than not implanting them) to do so when the recipients are in fast moving reference frames moving in opposite directions. Note the same result holds if we assume the device only creates discomfort or euphoria a single time with the minor assumption that if two worlds only differ in events before time t then what happens after time t is irrelevant to which one is preferable.
However, the same analysis done in the reference frame of the unpleasant implant gives the exact opposite conclusion.
Avoiding the Paradox
Perhaps one might try and avoid the paradox by insisting that no experience truly occurs instantaneously. However, this is easily seen to be futile.
Assume that each device inflicts pleasure or discomfort for duration epsilon << 1 second. If you assume that the total badness of the uncomfortable experience is somehow mediated by changes in neurochemistry or other physical properties you are lead to the assumption that even described from the reference frame of the desirable implant the experience of 2*epsilon seconds of discomfort by the time dilated individual is really no worse than the experience of epsilon seconds of discomfort would be for someone with that implant in your reference frame. In other words when time is dilated the experience of pain per unit time is diluted. This leads to the exact same result as above.
On the other hand if we really do increase the weight we give to pain experienced by those undergoing time dilation an even simpler set of implants leads to paradox. These implants start working immediately, one generating a pleasant experience for 5 minutes the other an unpleasant experience for 5 minutes again calibrated so that installing both is overall neutral. Now by assumption from the reference frame of the beneficial implant things are overall worse (the longer duration of discomfort experienced by the other individual is overall worse than someone in the same reference frame getting the undesirable implant) and vice versa from the other reference frame.
The use of instantaneous experiences was merely a way to simplify the example but irrelevant to the underlying inequalities. Those inequalities are a result of the implicit time discounting forced by the assumption that other things being equal it is better for improvements to occur now rather than later combined with the fact that realism about relativity renders facts about simultaneity incoherent.
Personally, I think the only decent way of avoiding this paradox is to deny realism about relativity. Sure, it's a radical move. However, it's also a radical move to say it's not true that it's better to cure cancer now than in 10 centuries even if the human race will continue to exist forever. Indeed, even if you don't assume literally infinite duration of effects even an unbounded potential length of effect with probabilities that decrease sufficiently slowly is equally problematic.
Responses
I've deliberately avoided phrasing this dilemma in terms of a formal paradox and listing the assumptions necessary to generate the paradox. Partly this is laziness but it's also a desire to see how people are inclined to respond before I attempt to draw up formal conditions. After all I ultimately want to capture common views in the assumptions and if I don't know what people's reactions are I can't pick the right assumptions.
Conservation of expected moral evidence, clarified
You know that when you title a post with "clarified", that you're just asking for the gods to smite you down, but let's try...
There has been some confusion about the concept of "conservation of expected moral evidence" that I touched upon in my posts here and here. The fault for the confusion is mine, so this is a brief note to try and explain it better.
The canonical example is that of a child who wants to steal a cookie. That child gets its morality mainly from its parents. The child strongly suspects that if it asks, all parents will indeed confirm that stealing cookies is wrong. So it decides not to ask, and happily steals the cookie.
I argued that this behvaiour showed a lack of "conservation of expected moral evidence": if the child knows what the answer would be, then that should be equivalent with actually asking. Some people got this immediately, and some people were confused that the agents I defined seemed Bayesian, and so should have conservation of expected evidence already, so how can they violate that principle?
The answer is... both groups are right. The child can be modelled as a Bayesian agent reaching sensible conclusions. If it values "I don't steal the cookie" at 0, "I steal the cookie without being told not to" at 1, and "I steal the cookie after being told not to" at -1, then its behaviour is rational - and those values are acceptable utility values over possible universes. So the child (and many value loading agents) are Bayesian agents with the usual properties.
But we are adding extra structure to the universe. Based on our understanding of what value loading should be, we are decreeing that the child's behaviour is incorrect. Though it doesn't violate expected utility, it violates any sensible meaning of value loading. Our idea of value loading is that, in a sense, values should be independent of many contingent things. There is nothing intrinsically wrong with "stealing cookies is wrong iff the Milky Way contains an even number of pulsars", but it violates what values should be. Similarly for "stealing cookies is wrong iff I ask about it".
But lets dig a bit deeper... Classical conservation of expected evidence fails in many cases. For instance, I can certainly influence the variable X="what Stuart will do in the next ten seconds" (or at least, my decision theory is constructed on assumptions that I can influence that). My decisions change X's expected value quite dramatically. What I can't influence is facts that are not contingent on my actions. For instance, I can't change my expected estimation of the number of pulsars in the galaxy last year. Were I super-powerful, I could change my expected estimation of the number of pulsars in the galaxy next year - by building or destroying pulsars, for instance.
So conservation of expected evidence only applies to things that are independent of the agent's decisions. When I say we need to have "conservation of expected moral evidence" I'm saying that the agent should treat their (expected) morality as independent of their decisions. The kid failed to do this in the example above, and that's the problem.
So conservation of expected moral evidence is something that would be automatically true if morality were something real and objective, and is also a desiderata when constructing general moral systems in practice.
Link: Study finds that using a foreign language changes moral decisions
In the new study, two experiments using the well-known "trolley dilemma" tested the hypothesis that when faced with moral choices in a foreign language, people are more likely to respond with a utilitarian approach that is less emotional.
The researchers collected data from people in the U.S., Spain, Korea, France and Israel. Across all populations, more participants selected the utilitarian choice -- to save five by killing one -- when the dilemmas were presented in the foreign language than when they did the problem in their native tongue.
The article:
http://www.sciencedaily.com/releases/2014/04/140428120659.htm
The publication:
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0094842
Be comfortable with hypocrisy
Neal Stephenson's The Diamond Age takes place several decades in the future and this conversation is looking back on the present day:
"You know, when I was a young man, hypocrisy was deemed the worst of vices,” Finkle-McGraw said. “It was all because of moral relativism. You see, in that sort of a climate, you are not allowed to criticise others-after all, if there is no absolute right and wrong, then what grounds is there for criticism?" [...]
"Now, this led to a good deal of general frustration, for people are naturally censorious and love nothing better than to criticise others’ shortcomings. And so it was that they seized on hypocrisy and elevated it from a ubiquitous peccadillo into the monarch of all vices. For, you see, even if there is no right and wrong, you can find grounds to criticise another person by contrasting what he has espoused with what he has actually done. In this case, you are not making any judgment whatsoever as to the correctness of his views or the morality of his behaviour-you are merely pointing out that he has said one thing and done another. Virtually all political discourse in the days of my youth was devoted to the ferreting out of hypocrisy." [...]
"We take a somewhat different view of hypocrisy," Finkle-McGraw continued. "In the late-twentieth-century Weltanschauung, a hypocrite was someone who espoused high moral views as part of a planned campaign of deception-he never held these beliefs sincerely and routinely violated them in privacy. Of course, most hypocrites are not like that. Most of the time it's a spirit-is-willing, flesh-is-weak sort of thing."
"That we occasionally violate our own stated moral code," Major Napier said, working it through, "does not imply that we are insincere in espousing that code."
I'm not sure if I agree with this characterization of the current political climate; in any case, that's not the point I'm interested in. I'm also not interested in moral relativism.
But the passage does point out a flaw which I recognize in myself: a preference for consistency over actually doing the right thing. I place a lot of stock--as I think many here do--on self-consistency. After all, clearly any moral code which is inconsistent is wrong. But dismissing a moral code for inconsistency or a person for hypocrisy is lazy. Morality is hard. It's easy to get a warm glow from the nice self-consistency of your own principles and mistake this for actually being right.
Placing too much emphasis on consistency led me to at least one embarrassing failure. I decided that no one who ate meat could be taken seriously when discussing animal rights: killing animals because they taste good seems completely inconsistent with placing any value on their lives. Furthermore, I myself ignored the whole concept of animal rights because I eat meat, so that it would be inconsistent for me to assign animals any rights. Consistency between my moral principles and my actions--not being a hypocrite--was more important to me than actually figuring out what the correct moral principles were.
To generalize: holding high moral ideals is going to produce cognitive dissonance when you are not able to live up to those ideals. It is always tempting--for me at least--to resolve this dissonance by backing down from those high ideals. An alternative we might try is to be more comfortable with hypocrisy.
[LINK] People become more utilitarian in VR moral dilemmas as compared to text based.
A new study indicates that people become more utilitarian (save more lives) when viewing a moral dilemma in a virtual reality situation, as compared to reading the same situation in text.
Abstract.
Although research in moral psychology in the last decade has relied heavily on hypothetical moral dilemmas and has been effective in understanding moral judgment, how these judgments translate into behaviors remains a largely unexplored issue due to the harmful nature of the acts involved. To study this link, we follow a new approach based on a desktop virtual reality environment. In our within-subjects experiment, participants exhibited an order-dependent judgment-behavior discrepancy across temporally-separated sessions, with many of them behaving in utilitarian manner in virtual reality dilemmas despite their non-utilitarian judgments for the same dilemmas in textual descriptions. This change in decisions reflected in the autonomic arousal of participants, with dilemmas in virtual reality being perceived more emotionally arousing than the ones in text, after controlling for general differences between the two presentation modalities (virtual reality vs. text). This suggests that moral decision-making in hypothetical moral dilemmas is susceptible to contextual saliency of the presentation of these dilemmas.
A question about utilitarianism and selfishness.
Utilitarianism seems to indicate that the greatest good for the most people generally revolves around their feelings. A person feeling happy and confident is a desired state, a person in pain and misery is undesirable.
But what about taking selfish actions that hurt another person's feelings? If I'm in a relationship and breaking up with her would hurt her feelings, does that mean I have a moral obligation to stay with her? If I have an employee who is well-meaning but isn't working out, am I morally allowed to fire him? Or what about at a club? A guy is talking to a woman, and she's ready to go home with him. I could socially tool him and take her home myself, but doing so would cause him greater unhappiness than I would have felt if I'd left them alone.
In a nutshell, does utilitarianism state that I am morally obliged to curb my selfish desires so that other people can be happy?
Eudaimonic Utilitarianism
Eliezer Yudkowsky on several occasions has used the term “Eudaimonia” to describe an objectively desirable state of existence. While the meta-ethics sequence on Less Wrong has been rather emphatic that simple universal moral theories are inadequate due to the complex nature of human values, one wonders, just what would happen if we tried anyway to build a moral theory around the notion of Eudaimonia. The following is a cursory attempt to do so. Even if you don’t agree with everything I say here, I ask that you please bear with me to the end before making judgments about this theory. Also, if you choose to downvote this post, please offer some criticism in the comments to explain why you choose to do so. I am admittedly new to posting in the Less Wrong community, and would greatly appreciate your comments and criticisms. Even though I use imperative language to argue my ideas, I consider this theory to be a work in progress at best. So without further ado, let us begin…
Classical Utilitarianism allows for situations where you could theoretically justify universal drug addiction as a way to maximize happiness if you could find some magical drug that made people super happy all the time with no side effects. There's a book called Brave New World by Aldous Huxley, where this drug called Soma is used to sedate the entire population, making them docile and dependent and very, very happy. Now, John Stuart Mill does argue that some pleasures are of a higher quality than others, but how exactly do you define and compare that quality? What exactly makes Shakespeare better than Reality TV? Arguably a lot of people are bored by Shakespeare and made happier by Reality TV.
Enter Aristotle. Aristotle had his own definition of happiness, which he called Eudaimonia. Roughly translated, it means "Human Flourishing". It is a complex concept, but I like to think of it as "reaching your full potential as a human being", "being the best that you can be", "fulfilling your purpose in life", and “authentic happiness” (based on the existential notion of authenticity). I think a better way to explain it is like this. The Classical Utilitarian concept of happiness is subjective. It is just the happiness that you feel in your limited understanding of everything. The Eudaimonic Utilitarian concept of happiness is objective. It is the happiness you would have if you did know everything that was really happening. If you, from the perspective of an impartial observer, knew the total truth (perfect information), would you be happy with the situation? You would probably only be truly happy if you were in the process of being the best possible you, and if it was the best possible reality. Theists have another name for this, and it is God's Will (See: Divine Benevolence, or an Attempt to Prove That the Principal End of the Divine Providence and Government is the Happiness of His Creatures (1731) by Thomas Bayes) (yes, that Bayes).
Looking at the metaphor of God, an omnibenevolent God wants everyone to be happy. But more than just happy as docile creatures, he wants them to fulfill their purpose and destiny and achieve their fullest potential for greatness because doing so allows them to contribute so much more to everything, and make the whole universe and His creation better. Now, it's quite possible that God does not exist. But His perspective, that of the impartial observer with perfect information and rationality, is still a tremendously useful perspective to have to make the best moral decisions, and is essentially the one that Eudaimonic Utilitarianism would like to be able to reason from.
Such happiness would be based on perfect rationality, and the assumption that happiness is the emotional goal state. It is the state that we achieve when we accomplish our goals, that is to say, we are being rational, and committing rational activity, also known as Arête. For this reason, Eudaimonia as a state is not necessarily human-specific. Any rational agent with goals, including, say a Paperclip Maximizer, might reach a Eudaimonic state even if it isn't "sentient" or "intelligent" in the way that we would understand it. It need not "feel happy" in a biochemical manner, only be goal-directed and have some sort of desired success state. Though I could argue that this desired success state would be the mechanical equivalent of happiness to a Really Powerful Optimization Process, that in its own way the Paperclip Maximizer feels pleasure when it succeeds at maximizing paperclips, and pain when it fails to do so.
Regardless, Eudaimonia would not be maximized by taking Soma. Eudaimonia would not be achieved by hooking up to the matrix if the matrix was a perfect utopia of happiness, because that utopia and happiness aren't real. They're a fantasy, a drug that prevents them from actually living and being who they're supposed to be, who they can be. They would be living a lie. Eudaimonia is based on the truth. It is based on reality and what can and should be done. It requires performing rational activity or actually achieving goals. It is an optimization given all the data.
I have begun by explaining how Eudaimonic Utilitarianism is superior to Classical Utilitarianism. I will now try to explain how Eudaimonic Utilitarianism is both superior and compatible to Preference Utilitarianism. Regular Preference Utilitarianism is arguably even more subjective than Classical Utilitarianism. With Preference Utilitarianism, you’re essentially saying that whatever people think is in their interests, is what should be maximized. But this assumes that their preferences are rational. In reality, most people’s preferences are strongly influenced by emotions and bounded rationality.
For instance, take the example of a suicidal and depressed man. Due to emotional factors, this man has the irrational desire to kill himself. Preference Utilitarianism would either have to accept this preference even though most would agree it is objectively “bad” for him, or do something like call this “manifest” preference to be inferior to the man’s “true” preferences. “Manifest” preferences are what a person’s actual behaviour would suggest, while “true” preferences are what they would have if they could view the situation with all relevant information and rational care. But how do we go about determining a person’s “true” preferences? Do we not have to resort to some kind of objective criterion of what is rational behaviour?
But where is this objective criterion coming from? Well a Classical Utilitarian would argue that suicide would lead to a negation of all the potential happiness that the person could feel in the future, and that rationality is what maximizes happiness. A Eudaimonic Utilitarian would go further and state that if the person knew everything, both their happiness and their preferences would be aligned towards rational activity and therefore not only would their objective happiness be maximized by not committing suicide, but their “true” preferences would also be maximized. Eudaimonia therefore is the objective criterion of rational behaviour. It is not merely subjective preference, but a kind of objective preference based on perfect information and perfect rationality.
Preference Utilitarianism only really works as a moral theory if the person’s preferences are based on rationality and complete knowledge of everything. Coincidentally, Eudaimonic Utilitarianism, assumes this position. It assumes that what should be maximized is the person’s preferences if they were completely rational and knew everything, because those preferences would naturally align with achieving Eudaimonia.
Therefore, Eudaimonic Utilitarianism can be seen as a merging, a unification of both Classical and Preference Utilitarianism because, from the perspective of an objective impartial observer, the state of Eudaimonia is simultaneously happiness and rational preference achieved through Arête, or rational activity, which is equivalent to “doing your best” or “maximizing your potential”.
Preference Utilitarianism is neutral as to whether or not to take Soma or plug into the Utopia Matrix. For Preference Utilitarianism, it’s up to the individual’s “rational” preference. Eudaimonic Utilitarianism on the other hand would argue that it is only rational to take Soma or plug into the Utopia Matrix if doing so still allows you to achieve Eudaimonia, which is unlikely, as doing so prevents one from performing Arête in the real world. At the very least, rather than basing it on a subjective preference, we are now using an objective evaluation function.
The main challenge of Eudaimonic Utilitarianism of course is that we as human beings with bounded rationality, do not have access to the position of God with regards to perfect information. Nevertheless, we can still apply Eudaimonic Utilitarianism in everyday scenarios.
For instance, consider the problem of Adultery. A common criticism of Classical Utilitarianism is that it doesn’t condemn acts like Adultery because at first glance, an act like Adultery seems like it would increase net happiness and therefore be condoned. This does not take into account the probabilities of being caught however. Given uncertainty, it is usually safe to assume a uniform distribution of probabilities, which means that getting caught has a 0.5 probability. We must then compare the utilities of not getting caught, and getting caught. It doesn’t really matter what the exact numbers are, so much as the relative relationship of the values. So for instance, we can say that Adultery in the not getting caught scenario has a +5 to each member of the Adultery, for a total of +10. However, in the getting caught scenario, there is a +5 to the uncoupled member, but a net loss of -20 to the coupled member, and -20 to the wronged partner, due to the potential falling out and loss of trust resulting from the discovered Adultery.
|
|
Commit Adultery |
Don’t Commit Adultery |
|
Truth Discovered |
-35 effect x 0.5 probability |
0 effect x 0.5 probability |
|
Truth Not Discovered |
+10 effect x 0.5 probability |
0 effect x 0.5 probability |
|
Potential Consequences |
-12.5 |
0 |
Thus the net total effect of Adultery in the caught scenario is -35. If we assign the probabilities to each scenario, +10 x 0.5 = +5, while -35 x 0.5 = -17.5. +5 – 17.5 = -12.5, therefore the probable net effect of Adultery is actually negative and therefore morally wrong.
But what if getting caught is very unlikely? Well, we can show that to a true agnostic at least, the probability of getting caught would be at least 0.5, because if we assume total ignorance, the probability that God and/or an afterlife exist would be a uniform distribution, as suggested by the Principle of Indifference and the Principle of Maximum Entropy. Thus there is at least a 0.5 chance that eventually the other partner will find out. But assuming instead a strong atheistic view, there is the danger that hypothetically, if the probability of truth not discovered was 1, then this calculation would actually suggest that committing Adultery would be moral.
The previous example is based on the subjective happiness of Classical Utilitarianism, but what if we used a criterion of Eudaimonia, or the objective happiness we would feel if we knew everything? In that case the Adultery scenario looks even more negative.
In this instance, we can say that Adultery in the not getting caught scenario has a +5 to each member of the Adultery, but also a -20 to the partner who is being wronged because that is how much they would suffer if they knew, which is a net -10. In the getting caught scenario, there is a +5 to the uncoupled member, but a net loss of -20 to the coupled member and an additional -20 to the partner being wronged, due to the potential falling out and loss of trust resulting from the discovered Adultery.
|
|
Commit Adultery |
Don’t Commit Adultery |
|
Truth Discovered |
-35 effect x 0.5 probability |
0 effect x 0.5 probability |
|
Truth Not Discovered |
-10 effect x 0.5 probability |
0 effect x 0.5 probability |
|
Potential Consequences |
-22.5 |
0 |
As you can see, with a Eudaimonic Utilitarian criterion, even if the probability of truth not discovered was 1, it would still be negative and therefore morally wrong. Thus, whereas Classical Utilitarianism based on subjective happiness bases its case against Adultery on the probability of being caught and the potential negative consequences, Eudaimonic Utilitarianism takes a more solid case that Adultery would always be wrong because regardless of the probability of being caught, the consequences are inherently negative. It is therefore unnecessary to resort to traditional Preference Utilitarianism to achieve our moral intuitions about Adultery.
Consider another scenario. You are planning a surprise birthday party for your friend, and she asks you what you are doing. You can either tell the truth or lie. Classical Utilitarianism would say to lie because the happiness of the surprise birthday party outweighs the happiness of being told the truth. Preference Utilitarianism however would argue that it is rational for the friend to want to know the truth and not have her friends lie to her generally, that this would be her “true” preference. Thus, Preference Utilitarianism would argue in favour of telling the truth and spoiling the surprise. The happiness that the surprise would cause does not factor into Preference Utilitarianism at all, and the friend has no prior preference for a surprise party she doesn’t even know about.
What does Eudaimonic Utilitarianism say? Well, if the friend really knew everything that was going on, would she be happier and prefer to know the truth in this situation, or be happier and prefer not to know? I would suggest she would be happier and prefer not to know, in which case Eudaimonic Utilitarianism agrees with Classical Utilitarianism and says we should lie to protect the secret of the surprise birthday party.
Again, what's the difference between eudaimonia and preference-fulfillment? Basically, preference-fulfillment is based on people's subjective preferences, while Eudaimonia is based on objective well-being, or as I like to explain, the happiness they would feel if they had perfect information.
The difference is somewhat subtle to the extent that a person's "true" preferences are supposed to be “the preferences he would have if he had all the relevant factual information, always reasoned with the greatest possible care, and were in a state of mind most conducive to rational choice.” (Harsanyi 1982) Note that relevant factual information is not the same thing as perfect information.
For instance, take the classic criticism of Utilitarianism in the form of the scenario where you hang an innocent man to satisfy the desires for justice of the unruly mob. Under both hedonistic and preference utilitarianism, the hanging of the innocent man can be justified because hanging the innocent man satisfies both the happiness of the mob, and the preferences of the mob. However, hanging an innocent man does not satisfy the Eudaimonia of the mob, because if the people in the mob knew that the man was innocent and were truly rational, they would not want to hang him after all. Note that in this case they only have this information under perfect information, as it is assumed that the man appears to all rational parties to be guilty even though he is actually innocent.
So, Eudaimonia assumes that in a hypothetical state of perfect information and rationality (that is to say objectivity), a person's happiness would best be satisfied by actions that might differ from what they might prefer in their normal subjective state, and that we should commit to the actions that satisfy this objective happiness (or well-being), rather than satisfy subjective happiness or subjective preferences.
For instance, we can take the example from John Rawls of the grass-counter. "Imagine a brilliant Harvard mathematician, fully informed about the options available to her, who develops an overriding desire to count the blades of grass on the lawns of Harvard." Under both hedonistic and preference utilitarianism, this would be acceptable. However, a Eudaimonic interpretation would argue that counting blades of grass would not maximize her objective happiness, that there is an objective state of being that would actually make her happier, even if it went against her personal preferences, and that this state of being is what should be maximized. Similarly, consider the rational philosopher who has come to the conclusion that life is meaningless and not worth living and therefore develops a preference to commit suicide. This would be his "true" preference, but it would not maximize his Eudaimonia. For this reason, we should try to persuade the suicidal philosopher not to commit suicide, rather than helping him do so.
How does Eudaimonia compare with Eliezer Yudkowsky’s concept of Coherent Extrapolated Volition (CEV)? Similarly to Eudaimonia, CEV is based on what an idealized version of us would want "if we knew more, thought faster, were more the people we wished we were, had grown up farther together". This is similar to but not the same thing as an idealized version of us with perfect information and with perfect rationality. Arguably Eudaimonia is sort of an extreme form of CEV that endorses the limits in this regard.
Furthermore, CEV assumes that the desires of humanity converge. The concept of Eudaimonia does not require this. The Eudaimonia of different sentient beings may well conflict, in which case Eudaimonic Utilitarianism takes the Utilitarian route and suggests the compromise of maximizing Eudaimonia for the greatest number of sentient beings, with a hierarchical preference for more conscious beings such as humans, over say ants. This is not to say that humans are necessarily absolute utility monsters to the ants. One could instead set it up so that the humans are much more heavily weighted in the moral calculus by their level of consciousness. Though that could conceivably lead to the situation where a billion ants might be more heavily weighted than a single human. If such a notion is anathema to you, then perhaps making humans absolute utility monsters may be reasonable to you after all. However, keep in mind that the same argument can be made that a superintelligent A.I. is a utility monster to humans. The idea that seven billion humans might outweigh one superintelligent A.I. in the moral calculus may not be such a bad idea.
In any case, Eudaimonic Utilitarianism does away with many of the unintuitive weaknesses of both Classical Hedonistic Utilitarianism, and Preference Utilitarianism. It validates our intuitions about the importance of authenticity and rationality in moral behaviour. It also attempts to unify morality and rationality. Though it is not without its issues, not the least of which being that it incorporates a very simplified view of human values, I nevertheless offer it as an alternative to other existing forms of Utilitarianism for your consideration.
Torture vs Dust Specks Yet Again
The first time I read Torture vs. Specks about a year ago I didn't read a single comment because I assumed the article was making a point that simply multiplying can sometimes get you the wrong answer to a problem. I seem to have had a different "obvious answer" in mind.
And don't get me wrong, I generally agree with the idea that math can do better than moral intuition in deciding questions of ethics. Take this example from Eliezer’s post Circular Altruism which made me realize that I had assumed wrong:
Suppose that a disease, or a monster, or a war, or something, is killing people. And suppose you only have enough resources to implement one of the following two options:
1. Save 400 lives, with certainty.
2. Save 500 lives, with 90% probability; save no lives, 10% probability.
I agree completely that you pick number 2. For me that was just manifestly obvious, of course the math trumps the feeling that you shouldn't gamble with people’s lives…but then we get to torture vs. dust specks and that just did not compute. So I've read most every argument I could find in favor of torture(there are a great deal and I might have missed something critical), but...while I totally understand the argument (I think) I'm still horrified that people would choose torture over dust specks.
I feel that the way that math predominates intuition begins to fall apart when you the problem compares trivial individual suffering with massive individual suffering, in a way very much analogous to the way in which Pascal’s Mugging stops working when you make the credibility really low but the threat really high. Like this. Except I find the answer to torture vs. dust specks to be much easier...
Let me give some examples to illustrate my point.
Can you imagine Harry killing Hermione because Voldemort threatened to plague all sentient life with one barely noticed dust speck each day for the rest of time? Can you imagine killing your own best friend/significant other/loved one to stop the powers of the Matrix from hitting 3^^^3 sentient beings with nearly inconsquential dust specks? Of course not. No. Snap decision.
Eliezer, would you seriously, given the choice by Alpha, the Alien superintelligence that always carries out its threats, give up all your work, and horribly torture some innocent person, all day for fifty years in the face of the threat of a 3^^^3 insignificant dust specks barely inconveniencing sentient beings? Or be tortured for fifty years to avoid the dust specks?
I realize that this is much more personally specific than the original question: but it is someone's loved one, someone's life. And if you wouldn't make the sacrifice what right do you have to say someone else should make it? I feel as though if you want to argue that torture for fifty years is better than 3^^^3 barely noticeable inconveniences you had better well be willing to make that sacrifice yourself.
And I can’t conceive of anyone actually sacrificing their life, or themselves to save the world from dust specks. Maybe I'm committing the typical mind fallacy in believing that no one is that ridiculously altruistic, but does anyone want an Artificial Intelligence that will potentially sacrifice them if it will deal with the universe’s dust speck problem or some equally widespread and trivial equivalent? I most certainly object to the creation of that AI. An AI that sacrifices me to save two others - I wouldn't like that, certainly, but I still think the AI should probably do it if it thinks their lives are of more value. But dust specks on the other hand....
This example made me immediately think that some sort of rule is needed to limit morality coming from math in the development of any AI program. When the problem reaches a certain low level of suffering and is multiplied it by an unreasonably large number it needs to take some kind of huge penalty because otherwise to an AI it would be vastly preferable the whole of Earth be blown up than 3^^^3 people suffer a mild slap to the face.
And really, I don’t think we want to create an Artificial Intelligence that would do that.
I’m mainly just concerned that some factor be incorporated into the design of any Artificial Intelligence that prevents it from murdering myself and others for trivial but widespread causes. Because that just sounds like a sci-fi book of how superintelligence could go horribly wrong.
An Attempt at Preference Uncertainty Using VNM
(This is a (possibly perpetual) draft of some work that we (I) did at the Vancouver meetup. Thanks to my meetup buddies for letting me use their brains as supplementary computational substrate. Sorry about how ugly the LaTeX is; is there a way to make this all look a bit nicer?)
(Large swaths of this are obsolete. Thanks for the input, LW!)
The Problem of Decision Under Preference Uncertainty
Suppose you are uncertain whether it is good to eat meat or not. It could be OK, or it could be very bad, but having not done the thinking, you are uncertain. And yet you have to decide what to eat now; is it going to be the tasty hamburger or the morally pure vegetarian salad?
You have multiple theories about your preferences that contradict in their assessment, and you want to make the best decision. How would you decide, even in principle, when you have such uncertainty? This is the problem of Preference Uncertainty.
Preference Uncertainty is a daily fact of life for humans; we simply don't have introspective access to our raw preferences in many cases, but we still want to make the best decisions we can. Just going with our intuitions about what seems most awesome is usually sufficient, but on higher-stakes decisions and theoretical reasoning, we want formal methods with more transparent reasoning processes. We especially like transparent formal methods if we want to create a Friendly AI.
There is unfortunately very little formal analysis of the preference uncertainty problem, and what has been done is incomplete and more philosophical than formal. Nonetheless, there has been some good work in the last few years. I'll refer you to Crouch's thesis if you're interested in that.
Using VNM
I'm going to assume VNM. That is, that rational preferences imply a utility function, and we decide between lotteries, choosing the one with highest expected utility.
The implications here are that the possible moral theories () each have an associated utility function (
) that represents their preferences. Also by VNM, our solution to preference uncertainty is a utility function
.
We are uncertain between moral theories, so we have a probability distribution over moral theories .
To make decisions, we need a way to compute the expected value of some lottery . Each lottery is essentially a probability distribution over the set of possible outcomes
.
Since we have uncertainty over multiple things (), the domain of the final preference structure is both moral theories and outcomes:
.
Now for some conditions. In the degenerate case of full confidence in one moral theory , the overall preferences should agree with that theory:
For some and
representing the degrees of freedom in utility function equality. That condition actually already contains most of the specification of
.
.
So we have a utility function, except for those unknown scaling and offset constants, which undo the arbitrariness in the basis and scale used to define each individual utility function.
Thus overall expectation looks like this:
.
This is still incomplete, though. If we want to get actual decisions, we need to pin down each and
.
Offsets and Scales
You'll see above that the probability distribution over is not dependent on the particular lottery, while
is a function of lottery. This is because I assumed that actions can't change what is right.
With this assumption, the contribution of the 's can be entirely factored out:
.
This makes it obvious that the effect of the 's is an additive constant that affects all lotteries the same way and thus never affects preferences. Thus we can set them to any value that is convenient; for this article, all
.
A similar process allows us to arbitrarily set exactly one of the .
The remaining values of actually affect decisions, so setting them arbitrarily has real consequences. To illustrate, consider the opening example of choosing lunch between a
and
when unsure about the moral status of meat.
Making up some details, we might have and
and
. Importing this into the framework described thus far, we might have the following payoff table:
| Moral Theory | U'(Burger) | U'(Salad) | (P(m)) |
|---|---|---|---|
| Meat OK (meat) | 1 | 0 | (0.7) |
| Meat Bad (veg) | 0 | k_veg | (0.3) |
| (expectation) | 0.7 | 0.3*k_veg | (1) |
We can see that with those probabilities, the expected value of exceeds that of
when
(when
), so the decision hinges on the value of that parameter.
The value of can be interpreted as the "intertheoretic weight" of a utility function candidate for the purposes of intertheoretic value comparisons.
In general, if then you have exactly
missing intertheoretic weights that determine how you respond to situations with preference uncertainty. These could be pinned down if you had
independent equations representing indifference scenarios.
For example, if we had when
, then we would have
, and the above decision would be determined in favor of the
.
Expressing Arbitrary Preferences
Preferences are arbitrary, in the sense that we should be able to want whatever we want to want, so our mathematical constructions should not dictate or limit our preferences. If they do, we should just decide to disagree.
What that means here is that because the values of drive important preferences (like at what probability you feel it is safe to eat meat), the math must leave them unconstrained, to be selected by whatever moral reasoning process it is that selected the candidate utility functions and gave them probabilities in the first place.
We could ignore this idea and attempt to use a "normalization" scheme to pin down the intertheoretic weights from the object level preferences without having to use additional moral reasoning. For example, we could dictate that the "variance" of each candidate utility function equals 1 (with some measure assignment over outcomes), which would divide out the arbitrary scales used to define the candidate utility functions, preventing dominance by arbitrary factors that shouldn't matter.
Consider that any given assignment of intertheoretic weights is equivalent to some set of indifference scenarios (like the one we used above for vegetarianism). For example, the above normalization scheme gives us the indifference scenario when
.
If I find that I am actually indifferent at like above, then I'm out of luck, unable to express this very reasonable preference. On the other hand, I can simply reject the normalization scheme and keep my preferences intact, which I much prefer.
(Notice that the normalization scheme was an unjustifiably privileged hypothesis from the beginning; we didn't argue that it was necessary, we simply pulled it out of thin air for no reason, so its failure was predictable.)
Thus I reassert that the 's are free parameters to be set accordance with our actual intertheoretic preferences, on pain of stupidity. Consider an analogy to the move from ordinal to cardinal utilities; when you add risk, you need more degrees of freedom in your preferences to express how you might respond to that risk, and you need to actually think about what you want those values to be.
Uncertainty Over Intertheoretic Weights
(This section is less solid than the others. Watch your step.)
A weakness in the constructions described so far is that they assume that we have access to perfect knowledge of intertheoretic preferences, even though the whole problem is that we are unable to find perfect knowledge of our preferences.
It seems intuitively that we could have a probability distribution over each . If we do this, making decisions is not much complicated, I think; a simple expectation should still work.
If expectation is the way, the expectation over can be factored out (by linearity or something). Thus in any given decision with fixed preference uncertainties, we can pretend to have perfect knowledge of
.
Despite the seeming triviality of the above idea for dealing with uncertainty over , I haven't formalized it much. We'll see if I figure it out soon, but for now, it would be foolish to make too many assumptions about this. Thus the rest of this article still assumes perfect knowledge of
, on the expectation that we can extend it later.
Learning Values, Among Other Things
Strictly speaking, inference across the is-ought gap is not valid, but we do it every time we act on our moral intuitions, which are just physical facts about our minds. Strictly speaking, inferring future events from past observations (induction) is not valid either, but it doesn't bother us much:
We deal with induction by defining an arbitrary (but good-seeming, on reflection) prior joint probability distribution over observations and events. We can handle the is-ought gap the same way: instead of separate probability distributions over events and moral facts
, we define a joint prior over
. Then learning value is just Bayesian updates on partial observations of
. Note that this prior subsumes induction.
Making decisions is still just maximizing expected utility with our constructions from above, though we will have to be careful to make sure that remains independent of the particular lottery.
The problem of how to define such a prior is beyond the scope of this article. I will note that this "moral prior" idea is the solid foundation on which to base Indirect Normativity schemes like Yudkowsky's CEV and Christiano's boxed philosopher. I will hopefully discuss this further in the future.
Recap
The problem was how to make decisions when you are uncertain about what your object-level preferences should be. To solve it, I assumed VNM, in particular that we have a set of possible utility functions, and we want to construct an overall utility function that does the right thing by those utility functions and their probabilities. The simple condition that the overall utility function should make the common sense choices in cases of moral certainty was sufficient to construct a utility function with a precise set of remaining degrees of freedom. The degrees of freedom being the intertheoretic weight and offset of each utility function candidate.
I showed that the offsets and an overall scale factor are superfluous, in the sense that they never affect the decision if we assume that actions don't affect what is desirable. The remaining intertheoretic weights do affect the decision, and I argued that they are critical to expressing whatever intertheoretic preferences we might want to have.
Uncertainty over intertheoretic weight seems tractable, but the details are still open.
I also mentioned that we can construct a joint distribution that allows us to embed value learning in normal Bayesian learning and induction. This "moral prior" would subsume induction and define how facts about the desirability of things could be inferred from physical observations like the opinions of moral philosophers. In particular, it would provide a solid foundation for Indirect Normativity schemes like CEV. The nature of this distribution is still open.
Open Questions
What are the details of how to deal with uncertainty over the intertheoretic weights? I am looking in particular for construction from an explicit set of reasonable assumptions like the above work, rather than simply pulling a method out of thin air unsupported.
What are the details of the Moral Prior? What is its nature? What implications does it have? What assumptions do we have to make to make it behave reasonably? How do we construct one that could be safely given to a superintelligence. This is going to be a lot of work.
I assumed that it is meaningful to assign probabilities over moral theories. Probability is closely tied up with utility, and probability over epiphenomena like preferences is especially difficult. It remains to be seen how much the framing here actually helps us, or if it effectively just disguises pulling a utility function out of a hat.
Is this at all correct? I should build it and see if it type-checks and does what it's supposed to do.
Mahatma Armstrong: CEVed to death.
My main objection to Coherent Extrapolated Volition (CEV) is the "Extrapolated" part. I don't see any reason to trust the extrapolated volition of humanity - but this isn't just for self centred reasons. I don't see any reason to trust my own extrapolated volition. I think it's perfectly possible that my extrapolated volition would follow some scenario like this:
- It starts with me, Armstrong 1. I want to be more altruistic at the next level, valuing other humans more.
- The altruistic Armstrong 2 wants to be even more altruistic. He makes himself into a perfectly altruistic utilitarian towards humans, and increases his altruism towards animals.
- Armstrong 3 wonders about the difference between animals and humans, and why he should value one of them more. He decided to increase his altruism equally towards all sentient creatures.
- Armstrong 4 is worried about the fact that sentience isn't clearly defined, and seems arbitrary anyway. He increase his altruism towards all living things.
- Armstrong 5's problem is that the barrier between living and non-living things isn't clear either (e.g. viruses). He decides that he should solve this by valuing all worthwhile things - is not art and beauty worth something as well?
- But what makes a thing worthwhile? Is there not art in everything, beauty in the eye of the right beholder? Armstrong 6 will make himself value everything.
- Armstrong 7 is in turmoil: so many animals prey upon other animals, or destroy valuable rocks! To avoid this, he decides the most moral thing he can do is to try and destroy all life, and then create a world of stasis for the objects that remain.
There are many other ways this could go, maybe ending up as a negative utilitarian or completely indifferent, but that's enough to give the flavour. You might trust the person you want to be, to do the right things. But you can't trust them to want to be the right person - especially several levels in (compare with the argument in this post, and my very old chaining god idea). I'm not claiming that such a value drift is inevitable, just that it's possible - and so I'd want my initial values to dominate when there is a large conflict.
Nor do I give Armstrong 7's values any credit for having originated from mine. Under torture, I'm pretty sure I could be made to accept any system of values whatsoever; there are other ways that would provably alter my values, so I don't see any reason to privilege Armstrong 7's values in this way.
"But," says the objecting strawman, "this is completely different! Armstrong 7's values are the ones that you would reach by following the path you would want to follow anyway! That's where you would get to, if you started out wanting to be more altruistic, had control over you own motivational structure, and grew and learnt and knew more!"
"Thanks for pointing that out," I respond, "now that I know where that ends up, I must make sure to change the path I would want to follow! I'm not sure whether I shouldn't be more altruistic, or avoid touching my motivational structure, or not want to grow or learn or know more. Those all sound pretty good, but if they end up at Armstrong 7, something's going to have to give."
Why economics is not a morality tale
Example nicked from this online Berkeley lecture.
Monopolies are bad (morality and economics agree here).
Firms that pollute are bad (morality and economics agree here).
What about monopolies that pollute?
What about strong monopolies that pollute and receive government subsidies?
Well...
Pollution, and other negative externalities, cause firms to produce too much of their product. That's because they don't pay the full cost of the product, including the impact of pollution.
The equilibrium behaviour for monopolies is to produce too little of their product, to keep prices and profits high.
So a monopoly that pollutes is subject to two opposite tendencies: the unpriced-pollution tendency to produce too much, and the monopolistic tendency to produce too little. If the effects are of comparable magnitude, then the monopoly might be much closer to social optimum than a free market would be (the social optimum, incidentally, will generally involve some pollution: we need to accept some pollution in the production of fertiliser, for instance, in order to have enough food to stop people starving).
In fact, if the monopolistic effect is too strong, then the firm may under-produce, even taken the pollution effect into account. In that case, we can approach closer to the social optimum by... subsidising the polluting monopoly to produce more!!
And that, my friends, is why economics is not a morality tale.
Morality should be Moral
This article is just some major questions concerning morality, then broken up into sub-questions to try to assist somebody in answering the major question; it's not a criticism of any morality in particular, but rather what I hope is a useful way to consider any moral system, and hopefully to help people challenge their own assumptions about their own moral systems. I don't expect responses to try to answer these questions; indeed, I'd prefer you don't. My preferred responses would be changes, additions, clarifications, or challenges to the questions or to the objective of this article.
First major question: Could you morally advocate other people adopt your moral system?
This isn't as trivial a question as it seems on its face. Take a strawman hedonism, for a very simple example. Is a hedonist's pleasure maximized by encouraging other people to pursue -their- pleasure? Or would it be better served by convincing them to pursue other people's (a class of people of which our strawman hedonist is a member) pleasure?
It's not merely selfish moralities which suffer meta-moral problems. I've encountered a few near-Comtean altruists who will readily admit their morality makes them miserable; the idea that other people are worse off than them fills them with a deep guilt which they cannot resolve. If their goal is truly the happiness of others, spreading their moral system is a short-term evil. (It may be a long-term good, depending on how they do their accounting, but non-moral altruism isn't actually a rare quality, so I think an honest accounting would suggest their moral system doesn't add much additional altruism to the system, only a lot of guilt about the fact that not much altruistic action is taking place.)
Note: I use the word "altruism" here in its modern, non-Comtean sense. Altruism is that which benefits others.
Does your moral system make you unhappy, on the whole? Does it, like most moral systems, place a value on happiness? Would it make the average person less or more happy, if they and they alone adopted it? Are your expectations of the moral value of your moral system predicated on an unrealistic scenario of universal acceptance? Maybe your moral system isn't itself very moral.
Second: Do you think your moral system makes you a more moral person?
Does your moral system promote moral actions? What percentage of your actions concerning your morality are spent feeling good because you feel like you've effectively promoted your moral system, rather than promoting the values inherent in it?
Do you behave any differently than you would if you operated under a "common law" morality, such as social norms and laws? That is, does your ethical system make you behave differently than if you didn't possess it? Are you evaluating the merits of your moral system solely on how it answers hypothetical situations, rather than how it addresses your day-to-day life?
Does your moral system promote behaviors you're uncomfortable with and/or could not actually do, such as pushing people in the way of trolleys to save more people?
Third: Does your moral system promote morality, or itself as a moral system?
Is the primary contribution of your moral system to your life adding outrage that other people -don't- follow your moral system? Do you feel that people who follow other moral systems are immoral even if they end up behaving in exactly the same way you do? Does your moral system imply complex calculations which aren't actually taking place? Is the primary purpose of your moral system encouraging moral behavior, or defining what the moral behavior would have been after the fact?
Considered as a meme or memeplex, does your moral system seem better suited to propagating itself than to encouraging morality? Do you think "The primary purpose of this moral system is ensuring that these morals continue to exist" could be an accurate description of your moral system? Does the moral system promote the belief that people who don't follow it are completely immoral?
Fourth: Is the major purpose of your morality morality itself?
This is a rather tough question to elaborate with further questions, so I suppose I should try to clarify a bit first: Take a strawman utilitarianism where "utility" -really is- what the morality is all about, where somebody has painstakingly gone through and assigned utility points to various things (this is kind of common in game-based moral systems, where you're just accumulating some kind of moral points, positive or negative). Or imagine (tough, I know) a religious morality where the sole objective of the moral system is satisfying God's will. That is, does your moral system define morality to be about something abstract and immeasurable, defined only in the context of your moral system? Is your moral system a tautology, which must be accepted to even be meaningful?
This one can be difficult to identify from the inside, because to some extent -all- human morality is tautological; you have to identify it with respect to other moralities, to see if it's a unique island of tautology, or whether it applies to human moral concerns in the general case. With that in mind, when you argue with other people about your ethical system, do they -always- seem to miss the point? Do they keep trying to reframe moral questions in terms of other moral systems? Do they bring up things which have nothing to do with (your) morality?
Utility Quilting
Related: Pinpointing Utility
Let's go for lunch at the Hypothetical Diner; I have something I want to discuss with you.
We will pick our lunch from the set of possible orders, and we will recieve a meal drawn from the set of possible meals, O.
Speaking in general, each possible order has an associated probability distribution over O. The Hypothetical Diner takes care to simplify your analysis; the probability distribution is trivial; you always get exactly what you ordered.
Again to simplify your lunch, the Hypothetical Diner offers only two choices on the menu: the Soup, and the Bagel.
To then complicate things so that we have something to talk about, suppose there is some set M of ways other things could be that may affect your preferences. Perhaps you have sore teeth on some days.
Suppose for the purposes of this hypothetical lunch date that you are VNM rational. Shocking, I know, but the hypothetical results are clear: you have a utility function, U. The domain of the utility function is the product of all the variables that affect your preferences (which meal, and whether your teeth are sore): U: M x O -> utility.
In our case, if your teeth are sore, you prefer the soup, as it is less painful. If your teeth are not sore, you prefer the bagel, because it is tastier:
U(sore & soup) > U(sore & bagel)
U(~sore & soup) < U(~sore & bagel)
Your global utility function can be partially applied to some m in M to get an "object-level" utility function U_m: O -> utility. Note that the restrictions of U made in this way need not have any resemblance to each other; they are completely separate.
It is convenient to think about and define these restricted "utility function patches" separately. Let's pick some units and datums so we can get concrete numbers for our utilities:
U_sore(soup) = 1 ; U_sore(bagel) = 0
U_unsore(soup) = 0 ; U_unsore(bagel) = 1
Those are separate utility functions, now, so we could pick units and datum seperately. Because of this, the sore numbers are totally incommensurable to the unsore numbers. *Don't try to comapre them between the UF's or you will get type-poisoning. The actual numbers are just a straightforward encoding of the preferences mentioned above.
What if we are unsure about where we fall in M? That is, we won't know whether our teeth are sore until we take the first bite. That is, we have a probability distribution over M. Maybe we are 70% sure that your teeth won't hurt you today. What should you order?
Well, it's usually a good idea to maximize expected utility:
EU(soup) = 30%*U(sore&soup) + 70%*U(~sore&soup) = ???
EU(bagel) = 30%*U(sore&bagel) + 70%*U(~sore&bagel) = ???
Suddenly we need those utility function patches to be commensuarable, so that we can actually compute these, but we went and defined them separately, darn. All is not lost though, recall that they are just restrictions of a global utility function to particular soreness-circumstance, with some (positive) linear transforms, f_m, thrown in to make the numbers nice:
f_sore(U(sore&soup)) = 1 ; f_sore(U(sore&bagel)) = 0
f_unsore(U(~sore&soup)) = 0 ; f_unsore(U(~sore&bagel)) = 1
At this point, it's just a bit of clever function-inverting and all is dandy. We can pick some linear transform g to be canonical, and transform all the utility function patches into that basis. So for all m, we can get g(U(m & o)) by inverting the f_m and then applying g:
g.U(sore & x) = (g.inv(f_sore).f_sore)(U(sore & x))
= k_sore*U_sore(x) + c_sore
g.U(~sore & x) = (g.inv(f_unsore).f_unsore)(U(~sore & x))
= k_unsore*U_unsore(x) + c_unsore
(I'm using . to represent composition of those transforms. I hope that's not too confusing.)
Linear transforms are really nice; all the inverting and composing collapses down to a scale k and an offset c for each utility function patch. Now we've turned our bag of utility function patches into a utility function quilt! One more bit of math before we get back to deciding what to eat:
EU(x) = P(sore) *(k_sore *U_sore(x) + c_sore) +
(1-P(sore))*(k_unsore*U_unsore(x) + c_unsore)
Notice that the terms involving c_m do not involve x, meaning that the c_m terms don't affect our decision, so we can cancel them out and forget they ever existed! This is only true because I've implicitly assumed that P(m) does not depend on our actions. If it did, like if we could go to the dentist or take some painkillers, then it would be P(m | x) and c_m would be relevent in the whole joint decision.
We can define the canonical utility basis g to be whatever we like (among positive linear transforms); for example, we can make it equal to f_sore so that we can at least keep the simple numbers from U_sore. Then we throw all the c_ms away, because they don't matter. Then it's just a matter of getting the remaining k_ms.
Ok, sorry, those last few paragraphs were rather abstract. Back to lunch. We just need to define these mysterious scaling constants and then we can order lunch. There is only one left; k_unsore. In general there will be n-1, where n is the size of M. I think the easiest way to approach this is to let k_unsore = 1/5 and see what that implies:
g.U(sore & soup) = 1 ; g.U(sore & bagel) = 0
g.U(~sore & soup) = 0 ; g.U(~sore & bagel) = 1/5
EU(soup) = (1-P(~sore))*1 = 0.3
EU(bagel) = P(~sore)*k_unsore = 0.14
EU(soup) > EU(bagel)
After all the arithmetic, it looks like if k_unsore = 1/5, even if we expect you to have nonsore teeth with P(sore) = 0.3, we are unsure enough and the relative importance is big enough that we should play safe safe and go with the soup anyways. In general we would choose soup if P(~sore) < 1/(k_unsore+1), or equivalently, if k_unsore < (1-P(~sore)/P(~sore).
So k is somehow the relative importance of possible preference stuctures under uncertainty. A smaller k in this lunch example means that the tastiness of a bagel over a soup is small relative to the pain saved by eating the soup instead. With this intuition, we can see that 1/5 is a somewhat reasonable value for this scenario, and for example, 1 would not be, and neither would 1/20
What if we are uncertain about k? Are we simply pushing the problem up some meta-chain? It turns out that no, we are not. Because k is linearly related to utility, you can simply use its expected value if it is uncertain.
It's kind of ugly to have these k_m's and these U_m's, so we can just reason over the product K x M instead of M and K seperately. This is nothing weird, it just means we have more utility function patches (Many of which encode the exact same object-level preferences).
In the most general case, the utility function patches in KxM are the space of all functions O -> RR, with offset equivalence, but not scale equivalence (Sovereign utility functions have full linear-transform equivalence, but these patches are only equivalent under offset). Remember, though, that these are just restricted patches of a single global utility function.
So what is the point of all this? Are we just playing in the VNM sandbox, or is this result actually interesting for anything besides sore teeth?
Perhaps Moral/Preference Uncertainty? I didn't mention it until now because it's easier to think about lunch than a philosophical minefield, but it is the point of this post. Sorry about that. Let's conclude with everything restated in terms of moral uncertainty.
TL;DR:
If we have:
-
A set of object-level outcomes
O, -
A set of "epiphenomenal" (outside of
O) 'moral' outcomesM, -
A probability distribution over
M, possibly correlated with uncertainty aboutO, but not in a way that allows our actions to influence uncertainty overM(that is, assuming moral facts cannot be changed by your actions.), -
A utility function over
Ofor each possible value ofM, (these can be arbitrary VNM-rational moral theories, as long as they share the same object-level), -
And we wish to be VNM rational over whatever uncertainty we have
then we can quilt together a global utility function U: (M,K,O) -> RR where and U(m,k,o) = k*U_m(o) so that EU(o) is the sum of all P(m)*E(k | m)*U_m(o)
Somehow this all seems like legal VNM.
Implications
So. Just the possible object-level preferences and a probability distribution over those is not enough to define our behaviour. We need to know the scale for each so we know how to act when uncertain. This is analogous to the switch from ordinal preferences to interval preferences when dealing with object-level uncertainty.
Now we have a well-defined framework for reasoning about preference uncertainty, if all our possible moral theories are VNM rational, moral facts are immutable, and we have a joint probability distribution over OxMxK.
In particular, updating your moral beliefs upon hearing new arguments is no longer a mysterious dynamic, it is just a bayesian update over possible moral theories.
This requires a "moral prior" that corellates moral outcomes and their relative scales to the observable evidence. In the lunch example, we implicitly used such a moral prior to update on observable thought experiments and conclude that 1/5 was a plausible value for k_unsore.
Moral evidence is probably things like preference thought-experiments, neuroscience and physics results, etc. The actual model for this, and discussion about the issues with defining and reasoning on such a prior are outside the scope of this post.
This whole argument couldn't prove its way out of a wet paper bag, and is merely suggestive. Bits and peices may be found incorrect, and formalization might change things a bit.
This framework requires that we have already worked out the outcome-space O (which we haven't), have limited our moral confusion to a set of VNM-rational moral theories over O (which we haven't), and have defined a "Moral Prior" so we can have a probability distribution over moral theories and their wieghts (which we haven't).
Nonetheless, we can sometimes get those things in special limited cases, and even in the general case, having a model for moral uncertainty and updating is a huge step up from the terrifying confusion I (and everyone I've talked to) had before working this out.
Does Existential Risk Justify Murder? -or- I Don't Want To Be A Supervillain
A few days ago I was rereading one of my favourite graphic novels. In it the supervillain commits mass murder to prevent nuclear war - he kills millions to save billions. This got me thinking about how a lot of LessWrong/Effective Altruism people approach existential risks (xrisks). An existential risk is one that threatens the premature extinction of Earth-originating intelligent life or the permanent and drastic destruction of its potential for desirable future development (Bostrom 2002). I'm going to point out an implication of this approach, show how this conflicts with a number of intuitions, and then try to clarify the conflict.
I. Implication:
If murder would reduce xrisk, one should commit the murder. The argument for this is that compared to billions or even trillions of future people, and/or the amount of valuable things they could instantiate (by experiencing happiness or pleasure, performing acts of kindness, creating great artworks, etc) the importance of one present person, and/or the badness of commiting (mass) murder is quite small. The large number on the 'future' side outweighs or cancels the far smaller number on the 'present' side.
I can think of a number of scenarios in which murder of one or more people could quite clearly reduce existential risk, such as the people who know the location of some secret refuge
Indeed at the extreme it would seem that reducing xrisk would justify some truly terrible things, like a preemptive nuclear strike on a rogue country.
This implication does not just hold for simplistic act-utilitarians, or consequentialists more broadly - it affects any moral theory that accords moral weight to future people and doesn't forbid murder.
This implication is implicitly endorsed in a common choice many of us make between focusing our resources on xrisk reduction as opposed to extreme poverty reduction. This is sometimes phrased as being about choosing to save one life now or far more future lives. While bearing in mind some complications (such as the debate over doing vs allowing and the Doctrine of Double Effect), it seems that 'letting several people die from extreme poverty to try to reduce xrisk' is in an important way similar to 'killing several people to try to reduce xrisk'.
II. Simple Objection:
A natural reaction to this implication is that this is wrong, one shouldn't commit murder to reduce xrisk. To evade some simple objections let us assume that we can be highly sure that the (mass) murder will indeed reduce xrisk: maybe no-one will find out about the murder, or it won't open a position for someone even worse.
Let us try and explain this reaction, and offer an objection: The idea that we should commit (mass) murder conflicts with some deeply held intuitions, such as the intuition that one shouldn't kill, and the intuition that one shouldn't punish a wrong-doer before she/he commits a crime.
One response - the most prominent advocate of which is probably Peter Singer - is to cast doubt onto our intuitions. We may have these intuitions, but they may have been induced by various means i.e. by evolution or society. Racist views were common in past societies. Moreover there is some evidence that humans may have a evolutionary predisposition to be racist. Nevertheless we reject racism, and therefore (so the argument goes) we should reject a number of other intuitions. So perhaps we should reject the intuitions we have, shrug off the squeamishness and agree that (mass) murder to reduce xrisk is justified.
[NB: I'm unsure about how convincing this response is. Two articles in Philosophy and Public Affairs dispute Singer's argument (Berker 2009) (Kamm 2009). One must also take into account the problem of applying our everyday intuitions to very unusual situations - see 'How Outlandish Can Imaginary Cases Be?' (Elster 2011)]
The trope of the supervillain justifying his or her crimes by claiming it had to be done for 'the greater good' (or similar) is well established. Tv tropes calls it Utopia Justifies The Means. I find myself slightly troubled when my moral beliefs lead me to agree with fictional supervillains. Nevertheless, is the best option to bite the bullet and side with the supervillains?
III. Complex Objection:
Let us return to the fictional example with which we started. Part of the reason his act seems wrong is that, in real life, the supervillain's mass murder was not necessary to prevent nuclear war - the Cold War ended without large-scale direct conflict between the USA and USSR. This seems to point the way to (some) clarification.
I find my intuitions change when the risk seems higher. While I'm unsure that murder is the right answer in the examples given above, it seems clearer in a situation where the disaster is in the midst of occurring, and murder or mass murder is the only way to prevent an existential disaster. The hypothetical that works for me is imagining some incredibly virulent disease or 'grey-goo' nano-replicator that has swept over Australia and is about to spread, and the only way to stop it is a nuclear strike.
One possibility is that my having a different intuition is simply because the situation is similar to hypotheticals that seem more familiar, such as shooting a hostage-taker or terrorist if that was the only way to prevent loss of innocent life.
But I'd like to suggest that it perhaps reflects a problem with xrisks, that it is the idea of doing something awful for a very uncertain benefit. The problem is the uncertainty. If a (mass) murder would prevent an existential disaster, then one should do it, but when it merely reduces xrisk it is less clear. Perhaps there should be some sort of probability threshold - if one has good reason to think the probability is over certain limits (10%, 50%, etc) then one is justified in committing gradually more heinous acts.
IV. Conclusion
In this post I've been trying to explain a troubling worry - to lay out my thinking - more than I have been trying to argue for or against an explicit claim. I have a problem with the claim that xrisk reduction is the most important task for humanity and/or me. On the one hand it seems convincing, yet on the other it seems to lead to some troubling implications - like justifying not focusing on extreme poverty reduction, or justifying (mass) murder.
Comments and criticism of the argument are welcomed. Also, I would be very interested in hearing people's opinions on this topic. Do you think that 'reducing xrisk' can justify murder? At what scale? Perhaps more importantly, does that bother you?
DISCLAIMER: I am in no way encouraging murder. Please do not commit murder.
Upgrading moral theories to include complex values
Like many members of this community, reading the sequences has opened my eyes to a heavily neglected aspect of morality. Before reading the sequences I focused mostly on how to best improve people's wellbeing in the present and the future. However, after reading the sequences, I realized that I had neglected a very important question: In the future we will be able to create creatures with virtually any utility function imaginable. What sort of values should we give the creatures of the future? What sort of desires should they have, from what should they gain wellbeing?
Anyone familiar with the sequences should be familiar with the answer. We should create creatures with the complex values that human beings possess (call them "humane values"). We should avoid creating creatures with simple values that only desire to maximize one thing, like paperclips or pleasure.
It is important that future theories of ethics formalize this insight. I think we all know what would happen if we programmed an AI with conventional utilitarianism: It would exterminate the human race and replace them with creatures whose preferences are easier to satisfy (if you program it with preference utilitarianism) or creatures whom it is easier to make happy (if you program it with hedonic utilitarianism). It is important to develop a theory of ethics that avoids this.
Lately I have been trying to develop a modified utilitarian theory that formalizes this insight. My focus has been on population ethics. I am essentially arguing that population ethics should not just focus on maximizing welfare, it should also focus on what sort of creatures it is best to create. According to this theory of ethics, it is possible for a population with a lower total level of welfare to be better than a population with a higher total level of welfare, if the lower population consists of creatures that have complex humane values, while the higher welfare population consists of paperclip or pleasure maximizers. (I wrote a previous post on this, but it was long and rambling, I am trying to make this one more accessible).
One of the key aspects of this theory is that it does not necessarily rate the welfare of creatures with simple values as unimportant. On the contrary, it considers it good for their welfare to be increased and bad for their welfare to be decreased. Because of this, it implies that we ought to avoid creating such creatures in the first place, so it is not necessary to divert resources from creatures with humane values in order to increase their welfare.
My theory does allow the creation of simple-value creatures for two reasons. One is if the benefits they generate for creatures with humane values outweigh the harms generated when humane-value creatures must divert resources to improving their welfare (companion animals are an obvious example of this). The second is if creatures with humane values are about to go extinct, and the only choices are replacing them with simple value creatures, or replacing them with nothing.
So far I am satisfied with the development of this theory. However, I have hit one major snag, and would love it if someone else could help me with it. The snag is formulated like this:
1. It is better to create a small population of creatures with complex humane values (that has positive welfare) than a large population of animals that can only experience pleasure or pain, even if the large population of animals has a greater total amount of positive welfare. For instance, it is better to create a population of humans with 50 total welfare than a population of animals with 100 total welfare.
2. It is bad to create a small population of creatures with humane values (that has positive welfare) and a large population of animals that are in pain. For instance, it is bad to create a population of animals with -75 total welfare, even if doing so allows you to create a population of humans with 50 total welfare.
3. However, it seems like, if creating human beings wasn't an option, that it might be okay to create a very large population of animals, the majority of which have positive welfare, but the some of which are in pain. For instance, it seems like it would be good to create a population of animals where one section of the population has 100 total welfare, and another section has -75, since the total welfare is 25.
The problem is that this leads to what seems like a circular preference. If the population of animals with 100 welfare existed by itself it would be okay to not create it in order to create a population of humans with 50 welfare instead. But if the population we are talking about is the one in (3) then doing that would result in the population discussed in (2), which is bad.
My current solution to this dilemma is to include a stipulation that a population with negative utility can never be better than one with positive utility. This prevents me from having circular preferences about these scenarios. But it might create some weird problems. If population (2) is created anyway, and the humans in it are unable to help the suffering animals in any way, does that mean they have a duty to create lots of happy animals to get their population's utility up to a positive level? That seems strange, especially since creating the new happy animals won't help the suffering ones in any way. On the other hand, if the humans are able to help the suffering animals, and they do so by means of some sort of utility transfer, then it would be in the best interests to create lots of happy animals, to reduce the amount of utility each person has to transfer.
So far some of the solutions I am considering include:
1. Instead of focusing on population ethics, just consider complex humane values to have greater weight in utility calculations than pleasure or paperclips. I find this idea distasteful because it implies it would be acceptable to inflict large harms on animals for relatively small gains for humans. In addition, if the weight is not sufficiently great it could still lead to an AI exterminating the human race and replacing them with happy animals, since animals are easier to take care of and make happy than humans.
2. It is bad to create the human population in (2) if the only way to do so is to create a huge amount of suffering animals. But once both populations have been created, if the human population is unable to help the animal population, they have no duty to create as many happy animals as they can. This is because the two populations are not causally connected, and that is somehow morally significant. This makes some sense to me, as I don't think the existence of causally disconnected populations in the vast universe should bear any significance on my decision-making.
3. There is some sort of overriding consideration besides utility that makes (3) seem desirable. For instance, it might be bad for creatures with any sort of values to go extinct, so it is good to create a population to prevent this, as long as its utility is positive on the net. However, this would change in a situation where utility is negative, such as in (2).
4. Reasons to create a creature have some kind complex rock-paper-scissors-type "trumping" hierarchy. In other words, the fact that the humans have humane values can override the reasons to create a happy animals, but they cannot override the reason to not create suffering animals. The reasons to create happy animals, however, can override the reasons to not create suffering animals. I think that this argument might lead to inconsistent preferences again, but I'm not sure.
I find none of these solutions that satisfying. I would really appreciate it if someone could help me with solving this dilemma. I'm very hopeful about this ethical theory, and would like to see it improved.
*Update. After considering the issue some more, I realized that my dissatisfaction came from equivocating two different scenarios. I was considering the scenario, "Animals with 100 utility and animals with -75 utility are created, no humans are created at all" to be the same as the scenario "Humans with 50 utility and animals with -75 utility are created, then the humans (before the get to experience their 50 utility) are killed/harmed in order to create more animals without helping the suffering animals in any way" to be the same scenario. They are clearly not.
To make the analogy more obvious, imagine I was given a choice between creating a person who would experience 95 utility over the course of their life, or a person who would experience 100 utility over the course of their life. I would choose the person with 100 utility. But if the person destined to experience 95 utility already existed, but had not experienced the majority of that utility yet, I would oppose killing them and replacing them with the 100 utility person.
Or to put it more succinctly, I am willing to not create some happy humans to prevent some suffering animals from being created. And if the suffering animals and happy humans already exist I am willing to harm the happy humans to help the suffering animals. But if the suffering animals and happy humans already exist I am not willing to harm the happy humans to create some extra happy animals that will not help the existing suffering animals in any way.
[LINK] The power of fiction for moral instruction
From Medical Daily: Psychologists Discover How People Subconsciously Become Their Favorite Fictional Characters
Psychologists have discovered that while reading a book or story, people are prone to subconsciously adopt their behavior, thoughts, beliefs and internal responses to that of fictional characters as if they were their own.
Experts have dubbed this subconscious phenomenon ‘experience-taking,’ where people actually change their own behaviors and thoughts to match those of a fictional character that they can identify with.
Researcher from the Ohio State University conducted a series of six different experiments on about 500 participants, reporting in the Journal of Personality and Social Psychology, found that in the right situations, ‘experience-taking,’ may lead to temporary real world changes in the lives of readers.
They found that stories written in the first-person can temporarily transform the way readers view the world, themselves and other social groups.
I always wondered at how Christopher Hitchens (who, when he wasn't being a columnist, was a professor of English literature) went on and on about the power of fiction for revealing moral truths. This gives me a better idea of how people could imprint on well-written fiction. More so than, say, logically-reasoned philosophical tracts.
This article is, of course, a popularisation. Anyone have links to the original paper?
Edit: Gwern delivers (PDF): Kaufman, G. F., & Libby, L. K. (2012, March 26). "Changing Beliefs and Behavior Through Experience-Taking." Journal of Personality and Social Psychology. Advance online publication. doi: 10.1037/a0027525
Caring about possible people in far Worlds
This relates to my recent post on existence in many-worlds.
I care about possible people. My child, if I ever have one, is one of them, and it seems monstrous not to care about one's children. There are many distinct ways of being a possible person. 1)You can be causally connected to some actual people in the actual world in some histories of that world. 2)You can be a counterpart of an actual person on a distinct world without causal connections 3)You can be distinct from all actual individuals, and in a causally separate possible world. 4)You can be acausally connectable to actual people, but in distinct possible worlds.
Those 4 ways are not separate partitions without overlap, sometimes they overlap, and I don't believe they exhaust the scope of possible people. The most natural question to ask is "should we care equally about about all kinds of possible people". Some people are seriously studying this, and let us hope they give us accurate ways to navigate our complex universe. While we wait, some worries seem relevant:
1) The Multiverse is Sadistic Argument:
P1.1: If all possible people do their morally relevant thing (call it exist, if you will) and
P1.2: We cannot affect (causally or acausally) what is or not possible
C1.0: Then we cannot affect the morally relevant thing.
2) The Multiverse is Paralyzing (related)
P2.1: We have reason to care about X-Risk
P2.2: Worlds where X-Risk obtains are possible
P2.3: We have nearly as much reason to worry about possible non-actual1 worlds where X-risk obtains, as we have to actual worlds where it obtains.
P2.4: There are infinitely more worlds where X-risk obtains that are possible than there are actual1
C2.0: Infinitarian Paralysis
1Actual here means belonging to the same quantum branching history as you. If you think you have many quantum successors, all of them are actual, same for predecessors, and people who inhabit your Hubble volume.
3) Reality-Fluid Can't Be All That Is Left Argument
P3.1) If all possible people do their morally relevant thing
P3.2) The way in which we can affect what is possible is by giving some subsets of it more units of reality-fluid, or quantum measure
P3.3) In fact reality-fluid is a ratio, such as a percentage of successor worlds of kind A or kind B for a particular world W
P3.4) A possible World3 with 5% reality-fluid in relation to World1 is causally indistinguishable from itself with 5 times more reality-fluid 25% in relation to World2.
P3.5) The morally relevant thing, though by constitution qualitative, seems to be quantifiable, and what matters is it's absolute quantity, not any kind of ratio.
C3.1: From 3.2 and 3.3 -> We can actually affect only a quantity that is relative to our world, not an absolute quantity.
C3.2: From C3.1 and P 3.5 -> We can't affect the relevant thing.
C3.3: We ended up having to talk about reality fluid because decisions matter, and reality fluid is the thing that decision changes (from P3.4 we know it isn't causal structure). But if all that decision changes is some ratio between worlds, and what matters by P3.5 is not a ratio between worlds, we have absolutely no clue of what we are talking about when we talk about "the thing that matters" "what we should care about" and "reality fluid".
These arguments are here not as a perfectly logical and acceptable argument structure, but to at least induce nausea about talking about Reality-Fluid, Measure, Morally relevant things in many-worlds, Morally relevant people causally disconnected to us. Those are not things you can Taboo the word away and keep the substance around. The problem does not lie in the word 'Existence', or in the sentence 'X is morally relevant'. It seems to me that the service that that existence or reality used to play doesn't make sense anymore (if all possible worlds exist or if Mathematical Universe Hypothesis is correct). We attempted to keep it around as a criterial determinant for What Matters. Yet now all that is left is this weird ratio that just can't be what matters. Without a criterial determinant for mattering, we are left in a position that makes me think we should head back towards a causal approach to morality. But this is an opinion, not a conclusion.
Edit: This post is an argument against the conjunctive truth of two things, Many Worlds, and the way in which we think of What Matters. It seems that the most natural interpretation of it is that Many Worlds is true, and thus my argument is against our notion of What Matters. In fact my position lies more in the opposite side - our notion of What Matters is (strongly related to) What Matters, so Many Worlds are less likely.
Population Ethics Shouldn't Be About Maximizing Utility
let me suggest a moral axiom with apparently very strong intuitive support, no matter what your concept of morality: morality should exist. That is, there should exist creatures who know what is moral, and who act on that. So if your moral theory implies that in ordinary circumstances moral creatures should exterminate themselves, leaving only immoral creatures, or no creatures at all, well that seems a sufficient reductio to solidly reject your moral theory.
I agree strongly with the above quote, and I think most other readers will as well. It is good for moral beings to exist and a world with beings who value morality is almost always better than one where they do not. I would like to restate this more precisely as the following axiom: A population in which moral beings exist and have net positive utility, and in which all other creatures in existence also have net positive utility, is always better than a population where moral beings do not exist.
While the axiom that morality should exist is extremely obvious to most people, there is one strangely popular ethical system that rejects it: total utilitarianism. In this essay I will argue that Total Utilitarianism leads to what I will call the Genocidal Conclusion, which is that there are many situations in which it would be fantastically good for moral creatures to either exterminate themselves, or greatly limit their utility and reproduction in favor of the utility and reproduction of immoral creatures. I will argue that the main reason consequentialist theories of population ethics produce such obviously absurd conclusions is that they continue to focus on maximizing utility1 in situations where it is possible to create new creatures. I will argue that pure utility maximization is only a valid ethical theory for "special case" scenarios where the population is static. I will propose an alternative theory for population ethics I call "ideal consequentialism" or "ideal utilitarianism" which avoids the Genocidal Conclusion and may also avoid the more famous Repugnant Conclusion.
I will begin my argument by pointing to a common problem in population ethics known as the Mere Addition Paradox (MAP) and the Repugnant Conclusion. Most Less Wrong readers will already be familiar with this problem, so I do not think I need to elaborate on it. You may also be familiar with a even stronger variation called the Benign Addition Paradox (BAP). This is essentially the same as the MAP, except that each time one adds more people one also gives a small amount of additional utility to the people who already existed. One then proceeds to redistribute utility between people as normal, eventually arriving at the huge population where everyone's lives are "barely worth living." The point of this is to argue that the Repugnant Conclusion can be arrived at from "mere addition" of new people that not only doesn't harm the preexisting-people, but also one that benefits them.
The next step of my argument involves three slightly tweaked versions of the Benign Addition Paradox. I have not changed the basic logic of the problem, I have just added one small clarifying detail. In the original MAP and BAP it was not specified what sort of values the added individuals in population A+ held. Presumably one was meant to assume that they were ordinary human beings. In the versions of the BAP I am about to present, however, I will specify that the extra individuals added in A+ are not moral creatures, that if they have values at all they are values indifferent to, or opposed to, morality and the other values that the human race holds dear.
1. The Benign Addition Paradox with Paperclip Maximizers.
Let us imagine, as usual, a population, A, which has a large group of human beings living lives of very high utility. Let us then add a new population consisting of paperclip maximizers, each of whom is living a life barely worth living. Presumably, for a paperclip maximizer, this would be a life where the paperclip maximizer's existence results in at least one more paperclip in the world than there would have been otherwise.
Now, one might object that if one creates a paperclip maximizer, and then allows it to create one paperclip, the utility of the other paperclip maximizers will increase above the "barely worth living" level, which would obviously make this thought experiment nonalagous with the original MAP and BAP. To prevent this we will assume that each paperclip maximizer that is created has a slightly different values on what the ideal size, color, and composition of the paperclip they are trying to produce is. So the Purple 2 centimeter Plastic Paperclip Maximizer gains no addition utility from when the Silver Iron 1 centimeter Paperclip Maximizer makes a paperclip.
So again, let us add these paperclip maximizers to population A, and in the process give one extra utilon of utility to each preexisting person in A. This is a good thing, right? After all, everyone in A benefited, and the paperclippers get to exist and make paperclips. So clearly A+, the new population, is better than A.
Now let's take the next step, the transition from population A+ to population B. Take some of the utility from the human beings and convert it into paperclips. This is a good thing, right?
So let us repeat these steps adding paperclip maximizers and utility, and then redistributing utility. Eventually we reach population Z, where there is a vast amount of paperclip maximizers, a vast amount of many different kinds of paperclips, and a small amount of human beings living lives barely worth living.
Obviously Z is better than A, right? We should not fear the creation of a paperclip maximizing AI, but welcome it! Forget about things like high challenge, love, interpersonal entanglement, complex fun, and so on! Those things just don't produce the kind of utility that paperclip maximization has the potential to do!
Or maybe there is something seriously wrong with the moral assumptions behind the Mere Addition and Benign Addition Paradoxes.
But you might argue that I am using an unrealistic example. Creatures like Paperclip Maximizers may be so far removed from normal human experience that we have trouble thinking about them properly. So let's replay the Benign Addition Paradox again, but with creatures we might actually expect to meet in real life, and we know we actually value.
2. The Benign Addition Paradox with Non-Sapient Animals
You know the drill by now. Take population A, add a new population to it, while very slightly increasing the utility of the original population. This time let's have it be some kind animal that is capable of feeling pleasure and pain, but is not capable of modeling possible alternative futures and choosing between them (in other words, it is not capable of having "values" or being "moral"). A lizard or a mouse, for example. Each one feels slightly more pleasure than pain in its lifetime, so it can be said to have a life barely worth living. Convert A+ to B. Take the utilons that the human beings are using to experience things like curiosity, beatitude, wisdom, beauty, harmony, morality, and so on, and convert it into pleasure for the animals.
We end up with population Z, with a vast amount of mice or lizards with lives just barely worth living, and a small amount of human beings with lives barely worth living. Terrific! Why do we bother creating humans at all! Let's just create tons of mice and inject them full of heroin! It's a much more efficient way to generate utility!
3. The Benign Addition Paradox with Sociopaths
What new population will we add to A this time? How about some other human beings, who all have anti-social personality disorder? True, they lack the key, crucial value of sympathy that defines so much of human behavior. But they don't seem to miss it. And their lives are barely worth living, so obviously A+ has greater utility than A. If given a chance the sociopaths will reduce the utility of other people to negative levels, but let's assume that that is somehow prevented in this case.
Eventually we get to Z, with a vast population of sociopaths and a small population of normal human beings, all living lives just barely worth living. That has more utility, right? True, the sociopaths place no value on things like friendship, love, compassion, empathy, and so on. And true, the sociopaths are immoral beings who do not care in the slightest about right and wrong. But what does that matter? Utility is being maximized, and surely that is what population ethics is all about!
Asteroid!
Let's suppose an asteroid is approaching each of the four population Zs discussed before. It can only be deflected by so much. Your choice is, save the original population of humans from A, or save the vast new population. The choice is obvious. In 1, 2, and 3, each individual has the same level utility, so obviously we should choose which option saves a greater number of individuals.
Bam! The asteroid strikes. The end result in all four scenarios is a world in which all the moral creatures are destroyed. It is a world without the many complex values that human beings possess. Each world, for the most part, lack things like complex challenge, imagination, friendship, empathy, love, and the other complex values that human beings prize. But so what? The purpose of population ethics is to maximize utility, not silly, frivolous things like morality, or the other complex values of the human race. That means that any form of utility that is easier to produce than those values is obviously superior. It's easier to make pleasure and paperclips than it is to make eudaemonia, so that's the form of utility that ought to be maximized, right? And as for making sure moral beings exist, well that's just ridiculous. The valuable processing power they're using to care about morality could be being used to make more paperclips or more mice injected with heroin! Obviously it would be better if they died off, right?
I'm going to go out on a limb and say "Wrong."
Is this realistic?
Now, to fair, in the Overcoming Bias page I quoted, Robin Hanson also says:
I’m not saying I can’t imagine any possible circumstances where moral creatures shouldn’t die off, but I am saying that those are not ordinary circumstances.
Maybe the scenarios I am proposing are just too extraordinary. But I don't think this is the case. I imagine that the circumstances Robin had in mind were probably something like "either all moral creatures die off, or all moral creatures are tortured 24/7 for all eternity."
Any purely utility-maximizing theory of population ethics that counts both the complex values of human beings, and the pleasure of animals, as "utility" should inevitably draw the conclusion that human beings ought to limit their reproduction to the bare minimum necessary to maintain the infrastructure to sustain a vastly huge population of non-human animals (preferably animals dosed with some sort of pleasure-causing drug). And if some way is found to maintain that infrastructure automatically, without the need for human beings, then the logical conclusion is that human beings are a waste of resources (as are chimps, gorillas, dolphins, and any other animal that is even remotely capable of having values or morality). Furthermore, even if the human race cannot practically be replaced with automated infrastructure, this should be an end result that the adherents of this theory should be yearning for.2 There should be much wailing and gnashing of teeth among moral philosophers that exterminating the human race is impractical, and much hope that someday in the future it will not be.
I call this the "Genocidal Conclusion" or "GC." On the macro level the GC manifests as the idea that the human race ought to be exterminated and replaced with creatures whose preferences are easier to satisfy. On the micro level it manifests as the idea that it is perfectly acceptable to kill someone who is destined to live a perfectly good and worthwhile life and replace them with another person who would have a slightly higher level of utility.
Population Ethics isn't About Maximizing Utility
I am going to make a rather radical proposal. I am going to argue that the consequentialist's favorite maxim, "maximize utility," only applies to scenarios where creating new people or creatures is off the table. I think we need an entirely different ethical framework to describe what ought to be done when it is possible to create new people. I am not by any means saying that "which option would result in more utility" is never a morally relevant consideration when deciding to create a new person, but I definitely think it is not the only one.3
So what do I propose as a replacement to utility maximization? I would argue in favor of a system that promotes a wide range of ideals. Doing some research, I discovered that G. E. Moore had in fact proposed a form of "ideal utilitarianism" in the early 20th century.4 However, I think that "ideal consequentialism" might be a better term for this system, since it isn't just about aggregating utility functions.
What are some of the ideals that an ideal consequentialist theory of population ethics might seek to promote? I've already hinted at what I think they are: Life, consciousness, and activity; health and strength; pleasures and satisfactions of all or certain kinds; happiness, beatitude, contentment, etc.; truth; knowledge and true opinions of various kinds, understanding, wisdom... mutual affection, love, friendship, cooperation; all those other important human universals, plus all the stuff in the Fun Theory Sequence. When considering what sort of creatures to create we ought to create creatures that value those things. Not necessarily, all of them, or in the same proportions, for diversity is an important ideal as well, but they should value a great many of those ideals.
Now, lest you worry that this theory has any totalitarian implications, let me make it clear that I am not saying we should force these values on creatures that do not share them. Forcing a paperclip maximizer to pretend to make friends and love people does not do anything to promote the ideals of Friendship and Love. Forcing a chimpanzee to listen while you read the Sequences to it does not promote the values of Truth and Knowledge. Those ideals require both a subjective and objective component. The only way to promote those ideals is to create a creature that includes them as part of its utility function and then help it maximize its utility.
I am also certainly not saying that there is never any value in creating a creature that does not possess these values. There are obviously many circumstances where it is good to create nonhuman animals. There may even be some circumstances where a paperclip maximizer could be of value. My argument is simply that it is most important to make sure that creatures who value these various ideals exist.
I am also not suggesting that it is morally acceptable to casually inflict horrible harms upon a creature with non-human values if we screw up and create one by accident. If promoting ideals and maximizing utility are separate values then it may be that once we have created such a creature we have a duty to make sure it lives a good life, even if it was a bad thing to create it in the first place. You can't unbirth a child.5
It also seems to me that in addition to having ideals about what sort of creatures should exist, we also have ideals about how utility ought to be concentrated. If this is the case then ideal consequentialism may be able to block some forms of the Repugnant Conclusion, even if situations where the only creatures whose creation is being considered are human beings. If it is acceptable to create humans instead of paperclippers, even if the paperclippers would have higher utility, it may also be acceptable to create ten humans with a utility of ten each instead of a hundred humans with a utility of 1.01 each.
Why Did We Become Convinced that Maximizing Utility was the Sole Good?
Population ethics was, until comparatively recently, a fallow field in ethics. And in situations where there is no option to increase the population, maximizing utility is the only consideration that's really relevant. If you've created creatures that value the right ideals, then all that is left to be done is to maximize their utility. If you've created creatures that do not value the right ideals, there is no value to be had in attempting to force them to embrace those ideals. As I've said before, you will not promote the values of Love and Friendship by creating a paperclip maximizer and forcing it to pretend to love people and make friends.
So in situations where the population is constant, "maximize utility" is a decent approximation of the meaning of right. It's only when the population can be added to that morality becomes much more complicated.
Another thing to blame is human-centric reasoning. When people defend the Repugnant Conclusion they tend to point out that a life barely worth living is not as bad as it would seem at first glance. They emphasize that it need not be a boring life, it may be a life full of ups and downs where the ups just barely outweigh the downs. A life worth living, they say, is a life one would choose to live. Derek Parfit developed this idea to some extent by arguing that there are certain values that are "discontinuous" and that one needs to experience many of them in order to truly have a life worth living.
The Orthogonality Thesis throws all these arguments out the window. It is possible to create an intelligence to execute any utility function, no matter what it is. If human beings have all sorts of complex needs that must be fulfilled in order to for them lead worthwhile lives, then you could create more worthwhile lives by killing the human race and replacing them with something less finicky. Maybe happy cows. Maybe paperclip maximizers. Or how about some creature whose only desire is to live for one second and then die. If we created such a creature and then killed it we would reap huge amounts of utility, for we would have created a creature that got everything it wanted out of life!
How Intuitive is the Mere Addition Principle, Really?
I think most people would agree that morality should exist, and that therefore any system of population ethics should not lead to the Genocidal Conclusion. But which step in the Benign Addition Paradox should we reject? We could reject the step where utility is redistributed. But that seems wrong, most people seem to consider it bad for animals and sociopaths to suffer, and that it is acceptable to inflict at least some amount of disutilities on human beings to prevent such suffering.
It seems more logical to reject the Mere Addition Principle. In other words, maybe we ought to reject the idea that the mere addition of more lives-worth-living cannot make the world worse. And in turn, we should probably also reject the Benign Addition Principle. Adding more lives-worth-living may be capable of making the world worse, even if doing so also slightly benefits existing people. Fortunately this isn't a very hard principle to reject. While many moral philosophers treat it as obviously correct, nearly everyone else rejects this principle in day-to-day life.
Now, I'm obviously not saying that people's behavior in their day-to-day lives is always good, it may be that they are morally mistaken. But I think the fact that so many people seem to implicitly reject it provides some sort of evidence against it.
Take people's decision to have children. Many people choose to have fewer children than they otherwise would because they do not believe they will be able to adequately care for them, at least not without inflicting large disutilities on themselves. If most people accepted the Mere Addition Principle there would be a simple solution for this: have more children and then neglect them! True, the children's lives would be terrible while they were growing up, but once they've grown up and are on their own there's a good chance they may be able to lead worthwhile lives. Not only that, it may be possible to trick the welfare system into giving you money for the children you neglect, which would satisfy the Benign Addition Principle.
Yet most people choose not to have children and neglect them. And furthermore they seem to think that they have a moral duty not to do so, that a world where they choose to not have neglected children is better than one that they don't. What is wrong with them?
Another example is a common political view many people have. Many people believe that impoverished people should have fewer children because of the burden doing so would place on the welfare system. They also believe that it would be bad to get rid of the welfare system altogether. If the Benign Addition Principle were as obvious as it seems, they would instead advocate for the abolition of the welfare system, and encourage impoverished people to have more children. Assuming most impoverished people live lives worth living, this is exactly analogous to the BAP, it would create more people, while benefiting existing ones (the people who pay less taxes because of the abolition of the welfare system).
Yet again, most people choose to reject this line of reasoning. The BAP does not seem to be an obvious and intuitive principle at all.
The Genocidal Conclusion is Really Repugnant
There is nearly nothing repugnant than the Genocidal Conclusion. Pretty much the only way a line of moral reasoning could go more wrong would be concluding that we have a moral duty to cause suffering, as an end in itself. This means that it's fairly easy to counter any argument in favor of total utilitarianism that argues the alternative I am promoting has odd conclusions that do not fit some of our moral intuitions, while total utilitarianism does not. Is that conclusion more insane than the Genocidal Conclusion? If it isn't, total utilitarianism should still be rejected.
Ideal Consequentialism Needs a Lot of Work
I do think that Ideal Consequentialism needs some serious ironing out. I haven't really developed it into a logical and rigorous system, at this point it's barely even a rough framework. There are many questions that stump me. In particular I am not quite sure what population principle I should develop. It's hard to develop one that rejects the MAP without leading to weird conclusions, like that it's bad to create someone of high utility if a population of even higher utility existed long ago. It's a difficult problem to work on, and it would be interesting to see if anyone else had any ideas.
But just because I don't have an alternative fully worked out doesn't mean I can't reject Total Utilitarianism. It leads to the conclusion that a world with no love, curiosity, complex challenge, friendship, morality, or any other value the human race holds dear is an ideal, desirable world, if there is a sufficient amount of some other creature with a simpler utility function. Morality should exist, and because of that, total utilitarianism must be rejected as a moral system.
1I have been asked to note that when I use the phrase "utility" I am usually referring to a concept that is called "E-utility," rather than the Von Neumann-Morgenstern utility that is sometimes discussed in decision theory. The difference is that in VNM one's moral views are included in one's utility function, whereas in E-utility they are not. So if one chooses to harm oneself to help others because one believes that is morally right, one has higher VNM utility, but lower E-utility.
2There is a certain argument against the Repugnant Conclusion that goes that, as the steps of the Mere Addition Paradox are followed the world will lose its last symphony, its last great book, and so on. I have always considered this to be an invalid argument because the world of the RC doesn't necessarily have to be one where these things don't exist, it could be one where they exist, but are enjoyed very rarely. The Genocidal Conclusion brings this argument back in force. Creating creatures that can appreciate symphonies and great books is very inefficient compared to creating bunny rabbits pumped full of heroin.
3Total Utilitarianism was originally introduced to population ethics as a possible solution to the Non-Identity Problem. I certainly agree that such a problem needs a solution, even if Total Utilitarianism doesn't work out as that solution.
4I haven't read a lot of Moore, most of my ideas were extrapolated from other things I read on Less Wrong. I just mentioned him because in my research I noticed his concept of "ideal utilitarianism" resembled my ideas. While I do think he was on the right track he does commit the Mind Projection Fallacy a lot. For instance, he seems to think that one could promote beauty by creating beautiful objects, even if there were no creatures with standards of beauty around to appreciate them. This is why I am careful to emphasize that to promote ideals like love and beauty one must create creatures capable of feeling love and experiencing beauty.
5My tentative answer to the question Eliezer poses in "You Can't Unbirth a Child" is that human beings may have a duty to allow the cheesecake maximizers to build some amount of giant cheesecakes, but they would also have a moral duty to limit such creatures' reproduction in order to spare resources to create more creatures with humane values.
EDITED: To make a point about ideal consequentialism clearer, based on AlexMennen's criticisms.
Amending the "General Pupose Intelligence: Arguing the Orthogonality Thesis"
Stuart has worked on further developing the orthogonality thesis, which gave rise to a paper, a non-final version of which you can see here: http://lesswrong.com/lw/cej/general_purpose_intelligence_arguing_the/
This post won't make sense if you haven't been through that.
Today we spent some time going over it and he accepted my suggestion of a minor amendment. Which best fits here.
Besides all the other awkward things that a moral convergentist would have to argue for, namely:
This argument generalises to other ways of producing the AI. Thus to deny the Orthogonality thesis is to assert that there is a goal system G, such that, among other things:
- There cannot exist any efficient real-world algorithm with goal G.
- If a being with arbitrarily high resources, intelligence, time and goal G, were to try design an efficient real-world algorithm with the same goal, it must fail.
- If a human society were highly motivated to design an efficient real-world algorithm with goal G, and were given a million years to do so along with huge amounts of resources, training and knowledge about AI, it must fail.
- If a high-resource human society were highly motivated to achieve the goals of G, then it could not do so (here the human society is seen as the algorithm).
- Same as above, for any hypothetical alien societies.
- There cannot exist any pattern of reinforcement learning that would train a highly efficient real-world intelligence to follow the goal G.
- There cannot exist any evolutionary or environmental pressures that would evolving highly efficient real world intelligences to follow goal G.
Desires You're Not Thinking About at the Moment
While doing some reading on philosophy I came across some interesting questions about the nature of having desires and preferences. One, do you still have preferences and desires when you are unconscious? Two, if you don't does this call into question the many moral theories that hold that having preferences and desires is what makes one morally significant, since mistreating temporarily unconscious people seems obviously immoral?
Philosophers usually discuss this question when debating the morality of abortion, but to avoid doing any mindkilling I won't mention that topic, except to say in this sentence that I won't mention it.
In more detail the issue is: A common, intuitive, and logical-seeming explanation for why it is immoral to destroy a typical human being, but not to destroy a rock, is that a typical human being has certain desires (or preferences or values, whatever you wish to call them, I'm using the terms interchangably) that they wish to fulfill, and destroying them would hinder the fulfillment of these desires. A rock, by contrast does not have any such desires so it is not harmed by being destroyed. The problem with this is that it also seems immoral to harm a human being who is asleep, or is in a temporary coma. And, on the face of it, it seems plausible to say that an unconscious person does not have any desires. (And of course it gets even weirder when considering far-out concepts like a brain emulator that is saved to a hard drive, but isn't being run at the moment)
After thinking about this it occurred to me that this line of reasoning could be taken further. If I am not thinking about my car at the moment, can I still be said to desire that it is not stolen? Do I stop having desires about things the instant my attention shifts away from them?
I have compiled a list of possible solutions to this problem, ranked in order from least plausible to most plausible.
1. One possibility would be to consider it immoral to harm a sleeping person because if they will have desires in the future, even if they don't now. I find this argument extremely implausible because it has some extremely bizarre implications, some of which may lead to insoluble moral contradictions. For instance, this argument could be used to argue that it is immoral to destroy skin cells because it is possible to use them to clone a new person, who will eventually grow up to have desires.
Furthermore, when human beings eventually gain the ability to build AIs that possess desires, this solution interacts with the orthogonality thesis in a catastrophic fashion. If it is possible to build an AI with any utility function, then for every potential AI one can construct, there is another potential AI that desires the exact opposite of that AI. That leads to total paralysis, since for every set potential set of desires we are capable of satisfying there is another potential set that would be horribly thwarted.
Lastly, this argument implies that you can, (and may be obligated to) help someone who doesn't exist, and never has existed, by satisfying their non-personal preferences, without ever having to bother with actually creating them. This seem strange, I can maybe see an argument for respecting the once-existant preferences of those who are dead, but respecting the hypothetical preferences of the never-existed seems absurd. It also has the same problems with the orthogonality thesis that I mentioned earlier.
2. Make the same argument as solution 1, but somehow define the categories more narrowly so that an unconscious person's ability to have desires in the future differs from that of an uncloned skin cell or an unbuilt AI. Michael Tooley has tried to do this by discerning between things that have the "possibility" of becoming a person with desires (i.e skin cells) and those that have the "capacity" to have desires. This approach has been criticized, and I find myself pessimistic about it because categories have a tendency to be "fuzzy" in real life and not have sharp borders.
3. Another solution may be that desires that one has had in the past continue to count, even when one is unconscious or not thinking about them. So it's immoral to harm unconscious people because before they were unconscious they had a desire not to be harmed, and it's immoral to steal my car because I desired that it not be stolen earlier when I was thinking about it.
I find this solution fairly convincing. The only major quibble I have with it is that it gives what some might consider a counter-intuitive result on a variation of the sleeping person question. Imagine a nano-factory manufacturers a sleeping person. This person is a new and distinct individual, and when they wake up they will proceed to behave as a typical human. This solution may suggest that it is okay to kill them before they wake up, since they haven't had any desires yet, which does seem odd.
4. Reject the claim that one doesn't have desires when one is unconscious, or when one is not thinking about a topic. The more I think about this solution, the more obvious it seems. Generally when I am rationally deliberating about whether or not I desire something I consider how many of my values and ideaks it fulfills. It seems like my list of values and ideals remains fairly constant, and that even if I am focusing my attention on one value at a time it makes sense to say that I still "have" the other values I am not focusing on at the moment.
Obviously I don't think that there's some portion of my brain where my "values" are stored in a neat little Excel spreadsheet. But they do seem to be a persistent part of its structure in some fashion. And it makes sense that they'd still be part of its structure when I'm unconscious. If they weren't, wouldn't my preferences change radically every time I woke up?
In other words, it's bad to harm an unconscious person because they have desires, preferences, values, whatever you wish to call them, that harming them would violate. And those values are a part of the structure of their mind that doesn't go away when they sleep. Skin cells and unbuilt AIs, by contrast, have no such values.
Now, while I think that explanation 4 resolves the issue of desires and unconsciousness best, I do think solution 3 has a great deal of truth to it as well (For instance, I tend to respect the final wishes of a dead person because they had desires in the past, even if they don't now). The solutions 3 and 4 are not incompatible at all, so one can believe in both of them.
I'm curious as to what people think of my possible solutions. Am I right about people still having something like desires in their brain when they are unconscious?
Pigliucci's comment on Yudkowsky's and Dai's stance on morality and logic
So morality has a lot to do with logic — indeed I have argued that moral reasoning is a type of applied logical reasoning — but it is not logic “all the way down,” it is anchored by certain contingent facts about humanity, bonoboness and so forth.
But, despite Yudkowsky’s confident claim, morality isn’t a matter of logic “all the way down,” because it has to start with some axioms, some brute facts about the type of organisms that engage in moral reasoning to begin with. Those facts don’t come from physics (though, like everything else, they better be compatible with all the laws of physics), they come from biology. A reasonable theory of ethics, then, can emerge only from a combination of biology (by which I mean not just evolutionary biology, but also cultural evolution) and logic.
http://rationallyspeaking.blogspot.de/2013/01/lesswrong-on-morality-and-logic.html
Some scary life extension dilemmas
Let's imagine a life extension drug has been discovered. One dose of this drug extends one's life by 49.99 years. This drug also has a mild cumulative effect, if it has been given to someone who has been dosed with it before it will extend their life by 50 years.
Under these constraints the most efficient way to maximize the amount of life extension this drug can produce is to give every dose to one individual. If there was one dose available for all seven-billion people alive on Earth then giving every person one dose would result in a total of 349,930,000,000 years of life gained. If one person was given all the doses a total of 349,999,999,999.99 years of life would be gained. Sharing the life extension drug equally would result in a net loss of almost 70 million years of life. If you're concerned about people's reaction to this policy then we could make it a big lottery, where every person on Earth gets a chance to gamble their dose for a chance at all of them.
Now, one could make certain moral arguments in favor of sharing the drug. I'll get to those later. However, it seems to me that gambling your dose for a chance at all of them isn't rational from a purely self-interested point of view either. You will not win the lottery. Your chances of winning this particular lottery are almost 7,000 times worse than your chances of winning the powerball jackpot. If someone gave me a dose of the drug, and then offered me a chance to gamble in this lottery, I'd accuse them of Pascal's mugging.
Here's an even scarier thought experiment. Imagine we invent the technology for whole brain emulation. Let "x" equal the amount of resources it takes to sustain a WBE through 100 years of life. Let's imagine that with this particular type of technology, it costs 10x to convert a human into a WBE and it costs 100x to sustain a biological human through the course of their natural life. Let's have the cost of making multiple copies of a WBE once they have been converted be close to 0.
Again, under these constraints it seems like the most effective way to maximize the amount of life extension done is to convert one person into a WBE, then kill everyone else and use the resources that were sustaining them to make more WBEs, or extend the life of more WBEs. Again, if we are concerned about people's reaction to this policy we could make it a lottery. And again, if I was given a chance to play in this lottery I would turn it down and consider it a form of Pascal's mugging.
I'm sure that most readers, like myself, would find these policies very objectionable. However, I have trouble finding objections to them from the perspective of classical utilitarianism. Indeed, most people have probably noticed that these scenarios are very similar to Nozick's "utility monster" thought experiment. I have made a list of possible objections to these scenarios that I have been considering:
1. First, let's deal with the unsatisfying practical objections. In the case of the drug example, it seems likely that a more efficient form of life extension will likely be developed in the future. In that case it would be better to give everyone the drug to sustain them until that time. However, this objection, like most practical ones, seems unsatisfying. It seems like there are strong moral objections to not sharing the drug.
Another pragmatic objection is that, in the case of the drug scenario, the lucky winner of the lottery might miss their friends and relatives who have died. And in the WBE scenario it seems like the lottery winner might get lonely being the only person on Earth. But again, this is unsatisfying. If the lottery winner were allowed to share their winnings with their immediate social circle, or if they were a sociopathic loner who cared nothing for others, it still seems bad that they end up killing everyone else on Earth.
2. One could use the classic utilitarian argument in favor of equality: diminishing marginal utility. However, I don't think this works. Humans don't seem to experience diminishing returns from lifespan in the same way they do from wealth. It's absurd to argue that a person who lives to the ripe old age of 60 generates less utility than two people who die at age 30 (all other things being equal). The reason the DMI argument works when arguing for equality of wealth is that people are limited in their ability to get utility from their wealth, because there is only so much time in the day to spend enjoying it. Extended lifespan removes this restriction, making a longer-lived person essentially a utility monster.
3. My intuitions about the lottery could be mistaken. It seems to me that if I was offered the possibility of gambling my dose of life extension drug with just one other person, I still wouldn't do it. If I understand probabilities correctly, then gambling for a chance at living either 0 or 99.99 additional years is equivalent to having a certainty of an additional 49.995 years of life, which is better than the certainty of 49.99 years of life I'd have if I didn't make the gamble. But I still wouldn't do it, partly because I'd be afraid I'd lose and partly because I wouldn't want to kill the person I was gambling with.
So maybe my horror at these scenarios is driven by that same hesitancy. Maybe I just don't understand the probabilities right. But even if that is the case, even if it is rational for me to gamble my dose with just one other person, it doesn't seem like the gambling would scale. I will not win the "lifetime lottery."
4. Finally, we have those moral objections I mentioned earlier. Utilitarianism is a pretty awesome moral theory under most circumstances. However, when it is applied to scenarios involving population growth and scenarios where one individual is vastly better at converting resources into utility than their fellows, it tends to produce very scary results. If we accept the complexity of value thesis (and I think we should), this suggests that there are other moral values that are not salient in the "special case" of scenarios with no population growth or utility monsters, but become relevant in scenarios where there are.
For instance, it may be that prioritarianism is better than pure utilitarianism, and in this case sharing the life extension method might be best because of the benefits it accords the least off. Or it may be (in the case of the WBE example) that having a large number of unique, worthwhile lives in the world is valuable because it produces experiences like love, friendship, and diversity.
My tentative guess at the moment is that there probably are some other moral values that make the scenarios I described morally suboptimal, even though they seem to make sense from a utilitarian perspective. However, I'm interested in what other people think. Maybe I'm missing something really obvious.
EDIT: To make it clear, when I refer to "amount of years added" I am assuming for simplicity's sake that all the years added are years that the person whose life is being extended wants to live and contain a large amount of positive experiences. I'm not saying that lifespan is exactly equivalent to utility. The problem I am trying to resolve is that it seems like the scenarios I've described seem to maximize the number of positive events it is possible for the people in the scenario to experience, even though they involve killing the majority of people involved. I'm not sure "positive experiences" is exactly equivalent to "utility" either, but it's likely a much closer match than lifespan.
Morality Isn't Logical
What do I mean by "morality isn't logical"? I mean in the same sense that mathematics is logical but literary criticism isn't: the "reasoning" we use to think about morality doesn't resemble logical reasoning. All systems of logic, that I'm aware of, have a concept of proof and a method of verifying with high degree of certainty whether an argument constitutes a proof. As long as the logic is consistent (and we have good reason to think that many of them are), once we verify a proof we can accept its conclusion without worrying that there may be another proof that makes the opposite conclusion. With morality though, we have no such method, and people all the time make moral arguments that can be reversed or called into question by other moral arguments. (Edit: For an example of this, see these posts.)
Without being a system of logic, moral philosophical reasoning likely (or at least plausibly) doesn't have any of the nice properties that a well-constructed system of logic would have, for example, consistency, validity, soundness, or even the more basic property that considering arguments in a different order, or in a different mood, won't cause a person to accept an entirely different set of conclusions. For all we know, somebody trying to reason about a moral concept like "fairness" may just be taking a random walk as they move from one conclusion to another based on moral arguments they encounter or think up.
In a recent post, Eliezer said "morality is logic", by which he seems to mean... well, I'm still not exactly sure what, but one interpretation is that a person's cognition about morality can be described as an algorithm, and that algorithm can be studied using logical reasoning. (Which of course is true, but in that sense both math and literary criticism as well as every other subject of human study would be logic.) In any case, I don't think Eliezer is explicitly claiming that an algorithm-for-thinking-about-morality constitutes an algorithm-for-doing-logic, but I worry that the characterization of "morality is logic" may cause some connotations of "logic" to be inappropriately sneaked into "morality". For example Eliezer seems to (at least at one point) assume that considering moral arguments in a different order won't cause a human to accept an entirely different set of conclusions, and maybe this is why. To fight this potential sneaking of connotations, I suggest that when you see the phrase "morality is logic", remind yourself that morality isn't logical.
Beware Selective Nihilism
In a previous post, I argued that nihilism is often short changed around here. However I'm far from certain that it is correct, and in the mean time I think we should be careful not to discard our values one at a time by engaging in "selective nihilism" when faced with an ontological crisis, without even realizing that's what's happening. Karl recently reminded me of the post Timeless Identity by Eliezer Yudkowsky, which I noticed seems to be an instance of this.
As I mentioned in the previous post, our values seem to be defined in terms of a world model where people exist as ontologically primitive entities ruled heuristically by (mostly intuitive understandings of) physics and psychology. In this kind of decision system, both identity-as-physical-continuity and identity-as-psychological-continuity make perfect sense as possible values, and it seems humans do "natively" have both values. A typical human being is both reluctant to step into a teleporter that works by destructive scanning, and unwilling to let their physical structure be continuously modified into a psychologically very different being.
If faced with the knowledge that physical continuity doesn't exist in the real world at the level of fundamental physics, one might conclude that it's crazy to continue to value it, and this is what Eliezer's post argued. But if we apply this reasoning in a non-selective fashion, wouldn't we also conclude that we should stop valuing things like "pain" and "happiness" which also do not seem to exist at the level of fundamental physics?
In our current environment, there is widespread agreement among humans as to which macroscopic objects at time t+1 are physical continuations of which macroscopic objects existing at time t. We may not fully understand what exactly it is we're doing when judging such physical continuity, and the agreement tends to break down when we start talking about more exotic situations, and if/when we do fully understand our criteria for judging physical continuity it's unlikely to have a simple definition in terms of fundamental physics, but all of this is true for "pain" and "happiness" as well.
I suggest we keep all of our (potential/apparent) values intact until we have a better handle on how we're supposed to deal with ontological crises in general. If we convince ourselves that we should discard some value, and that turns out to be wrong, the error may be unrecoverable once we've lived with it long enough.
Ontological Crisis in Humans
Imagine a robot that was designed to find and collect spare change around its owner's house. It had a world model where macroscopic everyday objects are ontologically primitive and ruled by high-school-like physics and (for humans and their pets) rudimentary psychology and animal behavior. Its goals were expressed as a utility function over this world model, which was sufficient for its designed purpose. All went well until one day, a prankster decided to "upgrade" the robot's world model to be based on modern particle physics. This unfortunately caused the robot's utility function to instantly throw a domain error exception (since its inputs are no longer the expected list of macroscopic objects and associated properties like shape and color), thus crashing the controlling AI.
According to Peter de Blanc, who used the phrase "ontological crisis" to describe this kind of problem,
Human beings also confront ontological crises. We should find out what cognitive algorithms humans use to solve the same problems described in this paper. If we wish to build agents that maximize human values, this may be aided by knowing how humans re-interpret their values in new ontologies.
I recently realized that a couple of problems that I've been thinking over (the nature of selfishness and the nature of pain/pleasure/suffering/happiness) can be considered instances of ontological crises in humans (although I'm not so sure we necessarily have the cognitive algorithms to solve them). I started thinking in this direction after writing this comment:
This formulation or variant of TDT requires that before a decision problem is handed to it, the world is divided into the agent itself (X), other agents (Y), and "dumb matter" (G). I think this is misguided, since the world doesn't really divide cleanly into these 3 parts.
What struck me is that even though the world doesn't divide cleanly into these 3 parts, our models of the world actually do. In the world models that we humans use on a day to day basis, and over which our utility functions seem to be defined (to the extent that we can be said to have utility functions at all), we do take the Self, Other People, and various Dumb Matter to be ontologically primitive entities. Our world models, like the coin collecting robot's, consist of these macroscopic objects ruled by a hodgepodge of heuristics and prediction algorithms, rather than microscopic particles governed by a coherent set of laws of physics.
For example, the amount of pain someone is experiencing doesn't seem to exist in the real world as an XML tag attached to some "person entity", but that's pretty much how our models of the world work, and perhaps more importantly, that's what our utility functions expect their inputs to look like (as opposed to, say, a list of particles and their positions and velocities). Similarly, a human can be selfish just by treating the object labeled "SELF" in its world model differently from other objects, whereas an AI with a world model consisting of microscopic particles would need to somehow inherit or learn a detailed description of itself in order to be selfish.
To fully confront the ontological crisis that we face, we would have to upgrade our world model to be based on actual physics, and simultaneously translate our utility functions so that their domain is the set of possible states of the new model. We currently have little idea how to accomplish this, and instead what we do in practice is, as far as I can tell, keep our ontologies intact and utility functions unchanged, but just add some new heuristics that in certain limited circumstances call out to new physics formulas to better update/extrapolate our models. This is actually rather clever, because it lets us make use of updated understandings of physics without ever having to, for instance, decide exactly what patterns of particle movements constitute pain or pleasure, or what patterns constitute oneself. Nevertheless, this approach hardly seems capable of being extended to work in a future where many people may have nontraditional mind architectures, or have a zillion copies of themselves running on all kinds of strange substrates, or be merged into amorphous group minds with no clear boundaries between individuals.
By the way, I think nihilism often gets short changed around here. Given that we do not actually have at hand a solution to ontological crises in general or to the specific crisis that we face, what's wrong with saying that the solution set may just be null? Given that evolution doesn't constitute a particularly benevolent and farsighted designer, perhaps we may not be able to do much better than that poor spare-change collecting robot? If Eliezer is worried that actual AIs facing actual ontological crises could do worse than just crash, should we be very sanguine that for humans everything must "add up to moral normality"?
To expand a bit more on this possibility, many people have an aversion against moral arbitrariness, so we need at a minimum a utility translation scheme that's principled enough to pass that filter. But our existing world models are a hodgepodge put together by evolution so there may not be any such sufficiently principled scheme, which (if other approaches to solving moral philosophy also don't pan out) would leave us with legitimate feelings of "existential angst" and nihilism. One could perhaps still argue that any current such feelings are premature, but maybe some people have stronger intuitions than others that these problems are unsolvable?
Do we have any examples of humans successfully navigating an ontological crisis? The LessWrong Wiki mentions loss of faith in God:
In the human context, a clear example of an ontological crisis is a believer’s loss of faith in God. Their motivations and goals, coming from a very specific view of life suddenly become obsolete and maybe even nonsense in the face of this new configuration. The person will then experience a deep crisis and go through the psychological task of reconstructing its set of preferences according the new world view.
But I don't think loss of faith in God actually constitutes an ontological crisis, or if it does, certainly not a very severe one. An ontology consisting of Gods, Self, Other People, and Dumb Matter just isn't very different from one consisting of Self, Other People, and Dumb Matter (the latter could just be considered a special case of the former with quantity of Gods being 0), especially when you compare either ontology to one made of microscopic particles or even less familiar entities.
But to end on a more positive note, realizing that seemingly unrelated problems are actually instances of a more general problem gives some hope that by "going meta" we can find a solution to all of these problems at once. Maybe we can solve many ethical problems simultaneously by discovering some generic algorithm that can be used by an agent to transition from any ontology to another?
(Note that I'm not saying this is the right way to understand one's real preferences/morality, but just drawing attention to it as a possible alternative to other more "object level" or "purely philosophical" approaches. See also this previous discussion, which I recalled after writing most of the above.)
Replaceability as a virtue
I propose it is altruistic to be replaceable and therefore, those who strive to be altruistic should strive to be replaceable.
As far as I can Google, this does not seem to have been proposed before. LW should be a good place to discuss it. A community interested in rational and ethical behavior, and in how superintelligent machines may decide to replace mankind, should at least bother to refute the following argument.
Replaceability
Replaceability is "the state of being replaceable". It isn't binary. The price of the replacement matters: so a cookie is more replaceable than a big wedding cake. Adequacy of the replacement also makes a difference: a piston for an ancient Rolls Royce is less replaceable than one in a modern car, because it has to be hand-crafted and will be distinguishable. So something is more or less replaceable depending on the price and quality of its replacement.
Replaceability could be thought of as the inverse of the cost of having to replace something. Something that's very replaceable has a low cost of replacement, while something that lacks replaceability has a high (up to unfeasible) cost of replacement. The cost of replacement plays into Total Cost of Ownership, and everything economists know about that applies. It seems pretty obvious that replaceability of possessions is good, much like cheap availability is good.
Some things (historical artifacts, art pieces) are valued highly precisely because of their irreplacability. Although a few things could be said about the resale value of such objects, I'll simplify and contend these valuations are not rational.
The practical example
Anne manages the central database of Beth's company. She's the only one who has access to that database, the skillset required for managing it, and an understanding of how it all works; she has a monopoly to that combination.
This monopoly gives Anne control over her own replacement cost. If she works according to the state of the art, writes extensive and up-to-date documentation, makes proper backups etc she can be very replaceable, because her monopoly will be easily broken. If she refuses to explain what she's doing, creates weird and fragile workarounds and documents the database badly she can reduce her replaceability and defend her monopoly. (A well-obfuscated database can take months for a replacement database manager to handle confidently.)
So Beth may still choose to replace Anne, but Anne can influence how expensive that'll be for Beth. She can at least make sure her replacement needs to be shown the ropes, so she can't be fired on a whim. But she might go further and practically hold the database hostage, which would certainly help her in salary negotiations if she does it right.
This makes it pretty clear how Anne can act altruistically in this situation, and how she can act selfishly. Doesn't it?
The moral argument
To Anne, her replacement cost is an externality and an influence on the length and terms of her employment. To maximize the length of her employment and her salary, her replacement cost would have to be high.
To Beth, Anne's replacement cost is part of the cost of employing her and of course she wants it to be low. This is true for any pair of employer and employee: Anne is unusual only in that she has a great degree of influence on her replacement cost.
Therefore, if Anne documents her database properly etc, this increases her replaceability and constitutes altruistic behavior. Unless she values the positive feeling of doing her employer a favor more highly than she values the money she might make by avoiding replacement, this might even be true altruism.
Unless I suck at Google, replaceability doesn't seem to have been discussed as an aspect of altruism. The two reasons for that I can see are:
- replacing people is painful to think about
- and it seems futile as long as people aren't replaceable in more than very specific functions anyway.
But we don't want or get the choice to kill one person to save the life of five, either, and such practical improbabilities shouldn't stop us from considering our moral decisions. This is especially true in a world where copies, and hence replacements, of people are starting to look possible at least in principle.
Singularity-related hypotheticals
- In some reasonably-near future, software is getting better at modeling people. We still don't know what makes a process intelligent, but we can feed a couple of videos and a bunch of psychological data points into a people modeler, extrapolate everything else using a standard population and the resulting model can have a conversation that could fool a four-year-old. The technology is already good enough for models of pets. While convincing models of complex personalities are at least another decade away, the tech is starting to become good enough for senile grandmothers.
Obviously no-one wants granny to die. But the kids would like to keep a model of granny, and they'd like to make the model before the Alzheimer's gets any worse, while granny is terrified she'll get no more visits to her retirement home.
What's the ethical thing to do here? Surely the relatives should keep visiting granny. Could granny maybe have a model made, but keep it to herself, for release only through her Last Will and Testament? And wouldn't it be truly awful of her to refuse to do that? - Only slightly further into the future, we're still mortal, but cryonics does appear to be working. Unfrozen people need regular medical aid, but the technology is only getting better and anyway, the point is: something we can believe to be them can indeed come back.
Some refuse to wait out these Dark Ages; they get themselves frozen for nonmedical reasons, to fastforward across decades or centuries into a time when the really awesome stuff will be happening, and to get the immortality technologies they hope will be developed by then.
In this scenario, wouldn't fastforwarders be considered selfish, because they impose on their friends the pain of their absence? And wouldn't their friends mind it less if the fastforwarders went to the trouble of having a good model (see above) made first? - On some distant future Earth, minds can be uploaded completely. Brains can be modeled and recreated so effectively that people can make living, breathing copies of themselves and experience the inability to tell which instance is the copy and which is the original.
Of course many adherents of soul theories reject this as blasphemous. A couple more sophisticated thinkers worry if this doesn't devalue individuals to the point where superhuman AIs might conclude that as long as copies of everyone are stored on some hard drive orbiting Pluto, nothing of value is lost if every meatbody gets devoured into more hardware. Bottom line is: Effective immortality is available, but some refuse it out of principle.
In this world, wouldn't those who make themselves fully and infinitely replaceable want the same for everyone they love? Wouldn't they consider it a dreadful imposition if a friend or relative refused immortality? After all, wasn't not having to say goodbye anymore kind of the point?
These questions haven't come up in the real world because people have never been replaceable in more than very specific functions. But I hope you'll agree that if and when people become more replaceable, that will be regarded as a good thing, and it will be regarded as virtuous to use these technologies as they become available, because it spares one's friends and family some or all of the cost of replacing oneself.
Replaceability as an altruist virtue
And if replaceability is altruistic in this hypothetical future, as well as in the limited sense of Anne and Beth, that implies replaceability is altruistic now. And even now, there are things we can do to increase our replaceability, i.e. to reduce the cost our bereaved will incur when they have to replace us. We can teach all our (valuable) skills, so others can replace us as providers of these skills. We can not have (relevant) secrets, so others can learn what we know and replace us as sources of that knowledge. We can endeavour to live as long as possible, to postpone the cost. We can sign up for cryonics. There are surely other things each of us could do to increase our replaceability, but I can't think of any an altruist wouldn't consider virtuous.
As an altruist, I conclude that replaceability is a prosocial, unselfish trait, something we'd want our friends to have, in other words: a virtue. I'd go as far as to say that even bothering to set up a good Last Will and Testament is virtuous precisely because it reduces the cost my bereaved will incur when they have to replace me. And although none of us can be truly easily replaceable as of yet, I suggest we honor those who make themselves replaceable, and are proud of whatever replaceability we ourselves attain.
So, how replaceable are you?
Is Equality Really about Diminishing Marginal Utility?
In Robert Nozick's famous "Utility Monster" thought experiment he proposes the idea of a creature that does not receive diminishing marginal utility from resource consumption, and argues that this poses a problem for utilitarian ethics. Why? Utilitarian ethics, while highly egalitarian in real life situations, does not place any intrinsic value on equality. The reason utilitarian ethics tend to favor equality is that human beings seem to experience diminishing returns when converting resources into utility. Egalitarianism, according to this framework, is good because sharing resources between people reduces the level of diminishing returns and maximizes the total amount of utility people generate, not because it's actually good for people to have equal levels of utility.
The problem the Utility Monster poses is that, since it does not receive diminishing marginal utility, there is no reason, under traditional utilitarian framework, to share resources between it and the other inhabitants of the world it lives in. It would be completely justified in killing other people and taking their things for itself, or enslaving them for its own benefit. This seems counter-intuitive to Nozick, and many other people.
There seem to be two possible reasons for this. One, of course, is that most people's intuitions are wrong in this particular case. The reason I am interesting in exploring, however, is the other one, namely that equality is valuable for its own sake, not just as a side effect of diminishing marginal utility.
Now, before I go any further I should clarify what I mean by "equality." There are many different types of equality, not all of which are compatible with each other. What I mean is equality of utility, everyone has the same level of satisfied preferences, happiness, and whatever else "utility" constitutes. This is not the same thing as fiscal equality, as some people may differ in their ability to convert money and resources into utility (people with horrible illnesses, for instance, are worse at doing so than the general population). It is also important to stress that "lifespan" should be factored in as part of the utility that is to be equalized (i.e. killing someone increases inequality). Otherwise one could achieve equality of utility by killing all the poor people.
So if equality is valuable for its own sake, how does one factor it into utilitarian calculations? It seems wrong to replace utility maximization with equality maximization. That would imply that a world where everyone had 10 utilons and a society where everyone had 100 utilons are morally identical, which seems wrong, to say the least.
What about making equality lexically prior to utility maximization? That seems just as bad. It would imply, among other things, that in a stratified world where some people have far greater levels of utility than others, that it would be morally right to take an action that would harm every single person in the world, as long as it hurt the best off slightly more than the worst off. That seems insanely wrong. The Utility Monster thought experiment already argues against making utility maximization lexically prior to equality.
So it seems like the best option would be to have maximizing utility and increasing equality as two separate values. How then, to trade one off against the other? If there is some sort of straight, one-to-one value then this doesn't do anything to dismiss the problem of the Utility Monster. A monster good enough at utility generation could simply produce so much utility that no amount of equality could equal its output.
The best possible solution I can see would be to have utility maximization and equality have diminishing returns relative to each other. This would mean that in a world with high equality, but low utility, raising utility would be more important, while in a world of low equality and high utility, establishing equality would be more important.
This solution deals with the utility monster fairly effectively. No matter how much utility the monster can generate, it is always better to share some of its resources with other people.
Now, you might notice that this doesn't eliminate every aspect of the utility monster problem. As long as the returns generated by utility maximization do not diminish to zero you can always posit an even more talented monster. And you can then argue that the society created by having that monster enslave the rest of the populace is better than one where a less talented monster shares with the rest of the populace. However, this new society would instantly become better if the new Utility Monster was forced to share its resources with the rest of the population.
This is a huge improvement over the old framework. Ordinary utility maximizing ethics would not only argue that a world where a Utility Monster enslaved everyone else might be a better world. They would argue that it was the optimal world, the best possible world given the constraints the inhabitants face. Under this new ethical framework, however, that is never the case. The optimal world, under any given level of constraints, is one where a utility monster shares with the rest of the population.
In other words, under this framework, if you were to ask, "Is it good for a utility monster to enslave the rest of the population?" the answer would always be "No."
Obviously the value of equality has many other aspects to be considered. For instance is it better described by traditional egalitarianism, or by prioritarianism? Values are often more complex than they first appear.
It also seems quite possible that there are other facets of value besides maximizing utility and equality of utility. For instance, total and average utilitarianism might be reconciled by making them two separate values that are both important. Other potential candidates include prioritarian concerns (if they are not included already), number of worthwhile lives (most people would consider a world full of people with excellent lives better than one inhabited solely by one ecstatic utility monster), consideration of prior-existing people, and perhaps many, many more. As with utility and equality, these values would have diminishing returns relative to each other, and an optimum society would be one where all receive some measure of consideration.
An aside. This next section is not directly related to the rest of the essay, but develops the idea in a direction I thought was interesting:
It seems to me that the value of equality could be the source of a local disagreement in population ethics. There are several people (Robin Hanson, most notably) who have argued that it would be highly desirable to create huge amounts of poor people with lives barely worth living, and that this may well be better than having a smaller, wealthier population. Many other people consider this to be a bad idea.
The unspoken assumption in this argument is that multiple lives barely worth living generate more utility than a single very excellent life. At first this seems like an obvious truth, based on the following chain of logic:
1. It is obviously wrong for Person A, who has a life barely worth living, to kill Person B, who also has a life barely worth living, and use B's property to improve their own life.
2. The only reason something is wrong is that it decreases the level of utility.
3. Therefore, killing Person B must decrease the level of utility.
4. Therefore, two lives barely worth living must generate more utility than a single excellent life.
However, if equality is valued for its own sake, then the reason it is wrong to kill Person B might be because of the vast inequality in various aspects of utility (lifespan, for instance) that their death would create between A and B.
This means that a society that has a smaller population living great lives might very well be generating a much larger amount of utility than a larger society whose inhabitants live lives barely worth living.
[Link] Inside the Cold, Calculating Mind of LessWrong?
An article from the Wall Street Journal. The original title might be slightly mind-killing for some people, but I found it moderately interesting especially considering that many LessWrongers formed part of the data set for the study the article talks about and a large fraction of us identified as libertarian on the last survey.
Inside the Cold, Calculating Libertarian Mind
An individual's personality shapes his or her political ideology at least as much as circumstances, background and influences. That is the gist of a recent strand of psychological research identified especially with the work of Jonathan Haidt. The baffling (to liberals) fact that a large minority of working-class white people vote for conservative candidates is explained by psychological dispositions that override their narrow economic interests.
In his recent book "The Righteous Mind," Dr. Haidt confronted liberal bafflement and made the case that conservatives are motivated by morality just as liberals are, but also by a larger set of moral "tastes"—loyalty, authority and sanctity, in addition to the liberal tastes for compassion and fairness. Studies show that conservatives are more conscientious and sensitive to disgust but less tolerant of change; liberals are more empathic and open to new experiences.
But ideology does not have to be bipolar. It need not fall on a line from conservative to liberal. In a recently published paper, Ravi Iyer from the University of Southern California, together with Dr. Haidt and other researchers at the data-collection platform YourMorals.org, dissect the personalities of those who describe themselves as libertarian.
These are people who often call themselves economically conservative but socially liberal. They like free societies as well as free markets, and they want the government to get out of the bedroom as well as the boardroom. They don't see why, in order to get a small-government president, they have to vote for somebody who is keen on military spending and religion; or to get a tolerant and compassionate society they have to vote for a large and intrusive state.
The study collated the results of 16 personality surveys and experiments completed by nearly 12,000 self-identified libertarians who visited YourMorals.org. The researchers compared the libertarians to tens of thousands of self-identified liberals and conservatives. It was hardly surprising that the team found that libertarians strongly value liberty, especially the "negative liberty" of freedom from interference by others. Given the philosophy of their heroes, from John Locke and John Stuart Mill to Ayn Rand and Ron Paul, it also comes as no surprise that libertarians are also individualistic, stressing the right and the need for people to stand on their own two feet, rather than the duty of others, or government, to care for people.
Perhaps more intriguingly, when libertarians reacted to moral dilemmas and in other tests, they displayed less emotion, less empathy and less disgust than either conservatives or liberals. They appeared to use "cold" calculation to reach utilitarian conclusions about whether (for instance) to save lives by sacrificing fewer lives. They reached correct, rather than intuitive, answers to math and logic problems, and they enjoyed "effortful and thoughtful cognitive tasks" more than others do.
The researchers found that libertarians had the most "masculine" psychological profile, while liberals had the most feminine, and these results held up even when they examined each gender separately, which "may explain why libertarianism appeals to men more than women."
All Americans value liberty, but libertarians seem to value it more. For social conservatives, liberty is often a means to the end of rolling back the welfare state, with its lax morals and redistributive taxation, so liberty can be infringed in the bedroom. For liberals, liberty is a way to extend rights to groups perceived to be oppressed, so liberty can be infringed in the boardroom. But for libertarians, liberty is an end in itself, trumping all other moral values.
Dr. Iyer's conclusion is that libertarians are a distinct species—psychologically as well as politically.
A version of this article appeared September 29, 2012, on page C4 in the U.S. edition of The Wall Street Journal, with the headline: Inside the Cold, Calculating Libertarian Mind.
The original paper.
Understanding Libertarian Morality: The Psychological Roots of an Individualist Ideology
Abstract: Libertarians are an increasingly vocal ideological group in U.S. politics, yet they are understudied compared to liberals and conservatives. Much of what is known about libertarians is based on the writing of libertarian intellectuals and political leaders, rather than surveying libertarians in the general population. Across three studies, 15 measures, and a large web-based sample (N = 152,239), we sought to understand the morality of selfdescribed libertarians. Based on an intuitionist view of moral judgment, we focused on the underlying affective and cognitive dispositions that accompany this unique worldview. We found that, compared to liberals and conservatives, libertarians show 1) stronger endorsement of individual liberty as their foremost guiding principle and correspondingly weaker endorsement of other moral principles, 2) a relatively cerebral as opposed to emotional intellectual style, and 3) lower interdependence and social relatedness. Our findings add to a growing recognition of the role of psychological predispositions in the organization of political attitudes.
Could evolution have selected for moral realism?
I was surprised to see the high number of moral realists on Less Wrong, so I thought I would bring up a (probably unoriginal) point that occurred to me a while ago.
Let's say that all your thoughts either seem factual or fictional. Memories seem factual, stories seem fictional. Dreams seem factual, daydreams seem fictional (though they might seem factual if you're a compulsive fantasizer). Although the things that seem factual match up reasonably well to the things that actually are factional, this isn't the case axiomatically. If deviating from this pattern is adaptive, evolution will select for it. This could result in situations like: the rule that pieces move diagonally in checkers seems fictional, while the rule that you can't kill people seems factual, even though they're both just conventions. (Yes, the rule that you can't kill people is a very good convention, and it makes sense to have heavy default punishments for breaking it. But I don't think it's different in kind from the rule that you must move diagonally in checkers.)
I'm not an expert, but it definitely seems as though this could actually be the case. Humans are fairly conformist social animals, and it seems plausible that evolution would've selected for taking the rules seriously, even if it meant using the fact-processing system for things that were really just conventions.
Another spin on this: We could see philosophy as the discipline of measuring, collating, and making internally consistent our intuitions on various philosophical issues. Katja Grace has suggested that the measurement of philosophical intuitions may be corrupted by the desire to signal on the part of the philosophy enthusiasts. Could evolutionary pressure be an additional source of corruption? Taking this idea even further, what do our intuitions amount to at all aside from a composite of evolved and encultured notions? If we're talking about a question of fact, one can overcome evolution/enculturation by improving one's model of the world, performing experiments, etc. (I was encultured to believe in God by my parents. God didn't drop proverbial bowling balls from the sky when I prayed for them, so I eventually noticed the contradiction in my model and deconverted. It wasn't trivial--there was a high degree of enculturation to overcome.) But if the question has no basis in fact, like the question of whether morals are "real", then genes and enculturation will wholly determine your answer to it. Right?
Yes, you can think about your moral intuitions, weigh them against each other, and make them internally consistent. But this is kind of like trying to add resolution back in to an extremely pixelated photo--just because it's no longer obviously "wrong" doesn't guarantee that it's "right". And there's the possibility of path-dependence--the parts of the photo you try to improve initially could have a very significant effect on the final product. Even if you think you're willing to discard your initial philosophical conclusions, there's still the possibility of accidentally destroying your initial intuitional data or enculturing yourself with your early results.
To avoid this possibility of path-dependence, you could carefully document your initial intuitions, pursue lots of different paths to making them consistent in parallel, and maybe even choose a "best match". But it's not obvious to me that your initial mix of evolved and encultured values even deserves this preferential treatment.
Currently, I disagree with what seems to be the prevailing view on Less Wrong that achieving a Really Good Consistent Match for our morality is Really Darn Important. I'm not sure that randomness from evolution and enculturation should be treated differently from random factors in the intuition-squaring process. It's randomness all the way through either way, right? The main reason "bad" consistent matches are considered so "bad", I suspect, is that they engender cognitive dissonance (e.g. maybe my current ethics says I should hack Osama Bin Laden to death in his sleep with a knife if I get the chance, but this is an extremely bad match for my evolved/encultured intuitions, so I experience a ton of cognitive dissonance actually doing this). But cognitive dissonance seems to me like just another aversive experience to factor in to my utility calculations.
Now that you've read this, maybe your intuition has changed and you're a moral anti-realist. But in what sense has your intuition "improved" or become more accurate?
I really have zero expertise on any of this, so if you have relevant links please share them. But also, who's to say that matters? In what sense could philosophers have "better" philosophical intuition? The only way I can think of for theirs to be "better" is if they've seen a larger part of the landscape of philosophical questions, and are therefore better equipped to build consistent philosophical models (example).
The Mere Cable Channel Addition Paradox
The following is a dialogue intended to illustrate what I think may be a serious logical flaw in some of the conclusions drawn from the famous Mere Addition Paradox.
EDIT: To make this clearer, the interpretation of the Mere Addition Paradox this post is intended to criticize is the belief that a world consisting of a large population full of lives barely worth living is the optimal world. That is, I am disagreeing with the idea that the best way for a society to use the resources available to it is to create as many lives barely worth living as possible. Several commenters have argued that another interpretation of the Mere Addition Paradox is that a sufficiently large population with a lower quality of life will always be better than a smaller population with a higher quality of life, even if such a society is far from optimal. I agree that my argument does not necessarily refute this interpretation, but think the other interpretation is common enough that it is worth arguing against.
EDIT: On the advice of some of the commenters I have added a shorter summary of my argument in non-dialogue form at the end. Since it is shorter I do not think it summarizes my argument as completely as the dialogue, but feel free to read it instead if pressed for time.
Bob: Hi, I'm with R&P cable. We're selling premium cable packages to interested customers. We have two packages to start out with that we're sure you love. Package A+ offers a larger selection of basic cable channels and costs $50. Package B offers a larger variety of exotic channels for connoisseurs, it costs $100. If you buy package A+, however, you'll get a 50% discount on B.
Alice: That's very nice, but looking at the channel selection, I just don't think that it will provide me with enough utilons.
Bob: Utilons? What are those?
Alice: They're the unit I use to measure the utility I get from something. I'm really good at shopping, so if I spend my money on the things I usually spend it on I usually get 1.5 utilons for every dollar I spend. Now, looking at your cable channels, I've calculated that I will get 10 utilons from buying Package A+ and 100 utilons from buying Package B. Obviously the total is 110, significantly less than the 150 utilons I'd get from spending $100 on other things. It's just not a good deal for me.
Bob: You think so? Well it so happens that I've met people like you in the past and have managed to convince them. Let me tell you about something called the "Mere Cable Channel Addition Paradox."
Alice: Alright, I've got time, make your case.
Bob: Imagine that the government is going to give you $50. Sounds like a good thing, right?
Alice: It depends on where it gets the $50 from. What if it defunds a program I think is important?
Bob: Let's say that it would defund a program that you believe is entirely neutral. The harms the program causes are exactly outweighed by the benefits it brings, leaving a net utility of zero.
Alice: I can't think of any program like that, but I'll pretend one exists for the sake of the argument. Yes, defunding it and giving me $50 would be a good thing.
Bob: Okay, now imagine the program's beneficiaries put up a stink, and demand the program be re-instituted. That would be bad for you, right?
Alice: Sure. I'd be out $50 that I could convert into 75 utilons.
Bob: Now imagine that the CEO of R&P Cable Company sleeps with an important senator and arranges a deal. You get the $50, but you have to spend it on Package A+. That would be better than not getting the money at all, right?
Alice: Sure. 10 utilons is better than zero. But getting to spend the $50 however I wanted would be best of all.
Bob: That's not an option in this thought experiment. Now, imagine that after you use the money you received to buy Package A+, you find out that the 50% discount for Package B still applies. You can get it for $50. Good deal, right?
Alice: Again, sure. I'd get 100 utilons for $50. Normally I'd only get 75 utilons.
Bob: Well, there you have it. By a mere addition I have demonstrated that a world where you have bought both Package A+ and Package B is better than one where you have neither. The only difference between the hypothetical world I imagined and the world we live in is that in one you are spending money on cable channels. A mere addition. Yet you have admitted that that world is better than this one. So what are you waiting for? Sign up for Package A+ and Package B!
And that's not all. I can keep adding cable packages to get the same result. The end result of my logic, which I think you'll agree is impeccable, is that you purchase Package Z, a package where you spend all the money other than that you need for bare subsistence on cable television packages.
Alice: That seems like a pretty repugnant conclusion.
Bob: It still follows from the logic. For every world where you are spending your money on whatever you have calculated generates the most utilons there exists another, better world where you are spending all your money on premium cable channels.
Alice: I think I found a flaw in your logic. You didn't perform a "mere addition." The hypothetical world differs from ours in two ways, not one. Namely, in this world the government isn't giving me $50. So your world doesn't just differ from this one in terms of how many cable packages I've bought, it also differs in how much money I have to buy them.
Bob: So can I interest you in a special form of the package? This one is in the form of a legally binding pledge. You pledge that if you ever make an extra $50 in the future you will use it to buy Package A+.
Alice: No. In the scenario you describe the only reason buying Package A+ has any value is that it is impossible to get utility out of that money any other way. If I just get $50 for some reason it's more efficient for me to spend it normally.
Bob: Are you sure? I've convinced a lot of people with my logic.
Alice: Like who?
Bob: Well, there were these two customers named Michael Huemer and Robin Hanson who both accepted my conclusion. They've both mortgaged their homes and started sending as much money to R&P cable as they can.
Alice: There must be some others who haven't.
Bob: Well, there was this guy named Derek Parfit who seemed disturbed by my conclusion, but couldn't refute it. The best he could do is mutter something about how the best things in his life would gradually be lost if he spent all his money on premium cable. I'm working on him though, I think I'll be able to bring him around eventually.
Alice: Funny you should mention Derek Parfit. It so happens that the flaw in your "Mere Cable Channel Addition Paradox" is exactly the same as the flaw in a famous philosophical argument he made, which he called the "Mere Addition Paradox."
Bob: Really? Do tell?
Alice: Parfit posited a population he called "A" which had a moderately large population with large amounts of resources, giving them a very high level of utility per person. Then he added a second population, which was totally isolated from the other population. How they were isolated wasn't important, although Parfit suggested maybe they were on separate continents and can't sail across the ocean or something like that. These people don't have nearly as many resources per person as the other population, so each person's level of utility is lower (their lack of resources is the only reason they have lower utility). However, their lives are still just barely worth living. He called the two populations "A+."
Parfit asked if "A+" was a better world than "A." He thought it was, since the extra people were totally isolated from the original population they weren't hurting anyone over there by existing. And their lives were worth living. Follow me so far?
Bob: I guess I can see the point.
Alice: Next Parfit posited a population called "B," which was the same as A+. except that the two populations had merged together. Maybe they got better at sailing across the ocean, it doesn't really matter how. The people share their resources. The result is that everyone in the original population had their utility lowered, while everyone in the second had it raised.
Parfit asked if population "B" was better than "A+" and argued that it was because it had a greater level of equality and total utility.
Bob: I think I see where this is going. He's going to keep adding more people, isn't he?
Alice: Yep. He kept adding more and more people until he reached population "Z," a vast population where everyone had so few resources that their lives were barely worth living. This, he argued, was a paradox, because he argued that most people would believe that Z is far worse than A, but he had made a convincing argument that it was better.
Bob: Are you sure that sharing their resources like that would lower the standard of living for the original population? Wouldn't there be economies of scale and such that would allow them to provide more utility even with less resources per person?
Alice: Please don't fight the hypothetical. We're assuming that it would for the sake of the argument.
Now, Parfit argued that this argument led to the "Repugnant Conclusion," the idea that the best sort of world is one with a large population with lives barely worth living. That confers on people a duty to reproduce as often as possible, even if doing so would lower the quality of their and everyone else's lives.
He claimed that the reason his argument showed this was that he had conducted "mere addition." The populations in his paradox differed in no way other than their size. By merely adding more people he had made the world "better," even if the level of utility per person plummetted. He claimed that "For every population, A, with a high average level of utility there exists another, better population, B, with more people and a lower average level of utility."
Do you see the flaw in Parfit's argument?
Bob: No, and that kind of disturbs me. I have kids, and I agree that creating new people can add utility to the world. But it seems to me that it's also important to enhance the utility of the people who already exist.
Alice: That's right. Normal morality tells us that creating new people with lives worth living and enhancing the utility of people that already exist are both good things to use resources on. Our common sense tells us that we should spend resources on both those things. The disturbing thing about the Mere Addition Paradox is that it seems at first glance to indicate that that's not true, that we should only devote resources to creating more people with barely worthwhile lives. I don't agree with that, of course.
Bob: Neither do I. It seems to me that having a large number of worthwhile lives and a high average utility are both good things and that we should try to increase them both, not just maximize one.
Alice: You're right, of course. But don't say "having a high average utility." Say "use resources to increase the utility of people who already exist."
Bob: What's the difference? They're the same thing, aren't they?
Alice: Not quite. There are other ways to increase average utility than enhancing the utility of existing people. You could kill all the depressed people, for instance. Plus, if there was a world where everyone was tortured 24 hours a day, you could increase average utility by creating some new people who are only tortured 23 hours a day.
Bob: That's insane! Who could possibly be that literal-minded?
Alice: You'd be surprised. The point is, a better way to phrase it is "use resources to increase the utility of people who already exist," not "increase average utility." Of course, that still leaves some stuff out, like the fact that it's probably better to increase everyone's utility equally, rather than focus on just one person. But it doesn't lead to killing depressed people, or creating slightly less tortured people in a Hellworld.
Bob: Okay, so what I'm trying to say is that resources should be used to create people, and to improve people's lives. Also equality is good. And that none of these things should completely eclipse the other, they're each too valuable to maximize just one. So a society that increases all of those values should be considered more efficient at generating value than a society that just maximizes one value. Now that we're done getting our terminology straight, will you tell me what Parfit's mistake was?
Alice: Population "A" and population "A+" differ in two ways, not one. Think about it. Parfit is clear that the extra people in "A+" do not harm the existing people when they are added. That means they do not use any of the original population's resources. So how do they manage to live lives worth living? How are they sustaining themselves?
Bob: They must have their own resources. To use Parfit's example of continents separated by an ocean; each continent must have its own set of resources.
Alice: Exactly. So "A+" differs from "A" both in the size of its population, and the amount of resources it has access to. Parfit was not "merely adding" people to the population. He was also adding resources.
Bob: Aren't you the one who is fighting the hypothetical now?
Alice: I'm not fighting the hypothetical. Fighting the hypothetical consists of challenging the likelihood of the thought experiment happening, or trying to take another option than the ones presented. What I'm doing is challenging the logical coherence of the hypothetical. One of Parfit's unspoken premises is that you need some resources to live a life worth living, so by adding more worthwhile lives he's also implicitly adding resources. If he had just added some extra people to population A without giving them their own continent full of extra resources to live on then "A+" would be worse than "A."
Bob: So the Mere Addition Paradox doesn't confer on us a positive obligation to have as many children as possible, because the amount of resources we have access to doesn't automatically grow with them. I get that. But doesn't it imply that as soon as we get some more resources we have a duty to add some more people whose lives are barely worth living?
Alice: No. Adding lives barely worth living uses the extra resources more efficiently than leaving Parfit's second continent empty for all eternity. But, it's not the most efficient way. Not if you believe that creating new people and enhancing the utility of existing people are both important values.
Let's take population "A+" again. Now imagine that instead of having a population of people with lives barely worth living, the second continent is inhabited by a smaller population with the same very high percentage of resources and utility per person as the population of the first continent. Call it "A++. " Would you say "A++" was better than "A+?"
Bob: Sure, definitely.
Alice: How about a world where the two continents exist, but the second one was never inhabited? The people of the first continent then discover the second one and use its resources to improve their level of utility.
Bob: I'm less sure about that one, but I think it might be better than "A+."
Alice: So what Parfit actually proved was: "For every population, A, with a high average level of utility there exists another, better population, B, with more people, access to more resources and a lower average level of utility."
And I can add my own corollary to that: "For every population, B, there exists another, better population, C, that has the same access to resources as B, but a smaller population and higher average utility."
Bob: Okay, I get it. But how does this relate to my cable TV sales pitch?
Alice: Well, my current situation, where I'm spending my money on normal things is analogous to Parfit's population "A." High utility, and very efficient conversion of resources into utility, but not as many resources. We're assuming, of course, that using resources to both create new people and improve the utility of existing people is more morally efficient than doing just one or the other.
The situation where the government gives me $50 to spend on Package A+ is analogous to Parfit's population A+. I have more resources and more utility. But the resources aren't being converted as efficiently as they could be.
The situation where I take the 50% discount and buy Package B is equivalent to Parfit's population B. It's a better situation than A+, but not the most efficient way to use the money.
The situation where I get the $50 from the government to spend on whatever I want is equivalent to my population C. A world with more access to resources than A, but more efficient conversion of resources to utility than A+ or B.
Bob: So what would a world where the government kept the money be analogous to?
Alice: A world where Parfit's second continent was never settled and remained uninhabited for all eternity, its resources never used by anyone.
Bob: I get it. So the Mere Addition Paradox doesn't prove what Parfit thought it did? We don't have any moral obligation to tile the universe with people whose lives are barely worth living?
Alice: Nope, we don't. It's more morally efficient to use a large percentage of our resources to enhance the lives of those who already exist.
Bob: This sure has been a fun conversation. Would you like to buy a cable package from me? We have some great deals.
Alice: NO!
SUMMARY:
My argument is that Parfit’s Mere Addition Paradox doesn’t prove what it seems to. The argument behind the Mere Addition Paradox is that you can make the world a better place by the “mere addition” of extra people, even if their lives are barely worth living. In other words : "For every population, A, with a high average level of utility there exists another, better population, B, with more people and a lower average level of utility." This supposedly leads to the Repugnant Conclusion, the belief that a world full of people whose lives are barely worth living is better than a world with a smaller population where the people lead extremely fulfilled and happy lives.
Parfit demonstrates this by moving from world A, consisting of a population full of people with lots of resources and high average utility, and moving to world A+. World A+ has an addition population of people who are isolated from the original population and not even aware of the other’s existence. The extra people live lives just barely worth living. Parfit argues that A+ is a better world than A because everyone in it has lives worth living, and the additional people aren’t hurting anyone by existing because they are isolated from the original population.
Parfit them moves from World A+ to World B, where the populations are merged and share resources. This lowers the standard of living for the original people and raises it for the newer people. Parfit argues that B must be better than A+, because it has higher total utility and equality. He then keeps adding people until he reaches Z, a world where everyones’ lives are barely worth living and the population is vast. He argues that this is a paradox because most people would agree that Z is not a desirable world compared to A.
I argue that the Mere Addition Paradox is a flawed argument because it does not just add people, it also adds resources. The fact that the extra people in A+ do not harm the original people of A by existing indicates that their population must have a decent amount of resources to live on, even if it is not as many per person as the population of A. For this reason what the Mere Addition Paradox proves is not that you can make the world better by adding extra people, but rather that you can make it better by adding extra people and resources to support them. I use a series of choices about purchasing cable television packages to illustrate this in concrete terms.
I further argue for a theory of population ethics that values both using resources to create lives worth living, and using resources to enhance the utility of already existing people, and considers the best sort of world to be one where neither of these two values totally dominate the other. By this ethical standard A+ might be better than A because it has more people and resources, even if the average level of utility is lower. However, a world with the same amount of resources as A+, but a lower population and the same, or higher average utility as A is better than A+.
The main unsatisfying thing about my argument is that while it avoids the Repugnant Conclusion in most cases, it might still lead to it, or something close to it, in situations where creating new people and getting new resources are, as one commenter noted, a “package deal.” In other words, a situation where it is impossible to obtain new resources without creating some new people whose utility levels are below average. However, even in this case, my argument holds that the best world of all is one where it would be possible to obtain the resources without creating new people, or creating a smaller amount of people with higher utility.
In other words, the Mere Addition Paradox does not prove that: "For every population, A, with a high average level of utility there exists another, better population, B, with more people and a lower average level of utility." Instead what the Mere Addition Paradox seems to demonstrate is that: "For every population, A, with a high average level of utility there exists another, better population, B, with more people, access to more resources and a lower average level of utility." Furthermore, my own argument demonstrates that: "For every population, B, there exists another, better population, C, which has the same access to resources as B, but a smaller population and higher average utility."
Evolutionary psychology as "the truth-killer"
So, a little background- I've just come out as an atheist to my dad, a Christian pastor, who's convinced he can "fix" my thinking and is bombarding me with a number of flimsy arguments that I'm having trouble articulating a response to, and need help shutting down. The particular issue at the moment deals with non-theistic explanations for human psychology and things like love, morality, and beauty. After attempting to communicate explanations from evolutionary psychology, I was met with amused dismissal of the subject as "speculation".
There's one book in particular he's having me read- The Reason for God by Timothy Keller. In the book, he brings up evolutionary psychology as an alternative to theistic explanations, and immediately dismisses it as apparently self-defeating.
"Evolutionists say that if God makes sense to us, it is not because he is really there, it's only because that belief helped us survive and so we are hardwired for it. However, if we can't trust our belief-forming faculties to tell us the truth about God, why should we trust them to tell us the truth about anything, including evolutionary science? If our cognitive faculties only tell us what we need to survive, not what is true, why trust them about anything at all?" -Timothy Keller
The obvious answer is that knowing the truth about things is generally advantageous to survival- but it hardly addresses the underlying assertion- that without [incredibly specific collection of god-beliefs and assorted dogmas], human brains can't arrive at truth because they weren't designed for it. And of course, I'm talking to a guy with an especially exacting definition of "truth" (100% certainty about the territory)- I could use an LW post that succinctly discusses the role and definition of truth, there.
Another thing Dad likes to do is back me into a corner WRT morality and moral relativism- "Oh, but can you really believe that the act of rape doesn't have an inherent [wrongness]? Are you saying it was justified for [insert historical monster] to do [atrocity] because it would make him reproductively successful?" Armed only with evolutionary explanations for their behavior, I couldn't really respond- possibly my fault, since I haven't read the Morality sequence on account of I got stuck in the Quantum Physics ultrasequence, and knowing that reality is composed of complex amplitudes flowing between explicit configurations or aaasasdjgasjdga whatever the frig even (I CAN'T) has proven to be staggeringly unhelpful in this situation.
In addition to particular arguments WRT the question posed, I could also use recommendations for good, well-argued and accessible books on the subject of evolutionary psychology, with a focus on practical experimental results and application- the guy can't be given a book and not read it, so I'm hoping to at least get him to not dismiss the science as "speculation" or a joke. It's likely he's aware that the field evolutionary psychology is really prone to hindsight bias and thus ignores it completely, so along with the book, a good article or study demonstrating the accuracy and predictive power of the evolutionary psychological model would be appreciated.
Thanks!
Malthusian copying: mass death of unhappy life-loving uploads
Robin Hanson has done a great job of describing the future world and economy, under the assumption that easily copied "uploads" (whole brain emulations), and the standard laws of economics continue to apply. To oversimplify the conclusion:
- There will be great and rapidly increasing wealth. On the other hand, the uploads will be in Darwinian-like competition with each other and with copies, which will drive their wages down to subsistence levels: whatever is required to run their hardware and keep them working, and nothing more.
The competition will not so much be driven by variation, but by selection: uploads with the required characteristics can be copied again and again, undercutting and literally crowding out any uploads wanting higher wages.
Megadeaths
Some have focused on the possibly troubling aspects voluntary or semi-voluntary death: some uploads would be willing to make copies of themselves for specific tasks, which would then be deleted or killed at the end of the process. This can pose problems, especially if the copy changes its mind about deletion. But much more troubling is the mass death among uploads that always wanted to live.
What the selection process will favour is agents that want to live (if they didn't, they'd die out) and willing to work for an expectation of subsistence level wages. But now add a little risk to the process: not all jobs pay exactly the expected amount, sometimes they pay slightly higher, sometimes they pay slightly lower. That means that half of all jobs will result in a life-loving upload dying (charging extra to pay for insurance will squeeze that upload out of the market). Iterating the process means that the vast majority of the uploads will end up being killed - if not initially, then at some point later. The picture changes somewhat if you consider "super-organisms" of uploads and their copies, but then the issue simply shifts to wage competition between the super-organisms.
The only way this can be considered acceptable is if the killing of a (potentially unique) agent that doesn't want to die, is exactly compensated by the copying of another already existent agent. I don't find myself in the camp arguing that that would be a morally neutral or positive action.
Pain and unhappiness
Thwarting a Catholic conversion?
I recently learned that a friend of mine, and a long-time atheist (and atheist blogger), is planning to convert to Catholicism. It seems the impetus for her conversion was increasing frustration that she had no good naturalistic account for objective morality in the form of virtue ethics; that upon reflection, she decided she felt like morality "loved" her; that this feeling implied God; and that she had sufficient "if God, then Catholicism" priors to point toward Catholicism, even though she's bisexual (!) and purports to still feel uncertain about the Church's views on sexuality. (Side note: all of this information is material she's blogged about herself, so it's not as if I'm sharing personal details she would prefer to be kept private.)
First, I want to state the rationality lesson I learned from this episode: atheists who spend a great deal of their time analyzing and even critiquing the views of a particular religion are at-risk atheists. Eliezer's spoken about this sort of issue before ("Someone who spends all day thinking about whether the Trinity does or does not exist, rather than Allah or Thor or the Flying Spaghetti Monster, is more than halfway to Christianity."), but I guess it took a personal experience to really drive the point home. When I first read my friend's post, I had a major "I notice that I am confused" moment, because it just seemed so implausible that someone who understood actual atheist arguments (as opposed to dead little sister Hollywood Atheism) could convert to religion, and Catholicism of all things. I seriously considered (and investigated) the possibility that her post was some kind of prank or experiment or otherwise not sincere, or that her account had been hijacked by a very good impersonator (both of these seem quite unlikely at this point).
But then I remembered how I had been frustrated in the past by her tolerance for what seemed like rank religious bigotry and how often I thought she was taking seriously theological positions that seemed about as likely as the 9/11 attacks being genuinely inspired and ordained by Allah. I remembered how I thought she had a confused conception of meta-ethics and that she often seemed skeptical of reductionism, which in retrospect should have been a major red flag for purported atheists. So yeah, spending all your time arguing about Catholic doctrine really is a warning sign, no matter how strongly you seem to champion the "atheist" side of the debate. Seriously.
But second, and more immediately, I wonder if anybody has advice on how to handle this, or if they've had similar experiences with their friends. I do care about this person, and I was devastated to hear this news, so if there's something I can do to help her, I want to. Of course, I would prefer most that she stop worrying about religion entirely and just grok the math that makes religious hypotheses so unlikely as to not be worth your time. But in the short term I'd settle for her not becoming a Catholic, and not immersing herself further in Dark Side Epistemology or surrounding herself with people trying to convince her that she needs to "repent" of her sexuality.
I think I have a pretty good understanding of the theoretical concepts at stake here, but I'm not sure where to start or what style of argument is likely to have the best effect at this point. My tentative plan is to express my concern, try to get more information about what she's thinking, and get a dialogue going (I expect she'll be open to this), but I wanted to see if you all had more specific suggestions, especially if you've been through similar experiences yourself. Thanks!
View more: Next
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)