Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
This article is just some major questions concerning morality, then broken up into sub-questions to try to assist somebody in answering the major question; it's not a criticism of any morality in particular, but rather what I hope is a useful way to consider any moral system, and hopefully to help people challenge their own assumptions about their own moral systems. I don't expect responses to try to answer these questions; indeed, I'd prefer you don't. My preferred responses would be changes, additions, clarifications, or challenges to the questions or to the objective of this article.
First major question: Could you morally advocate other people adopt your moral system?
This isn't as trivial a question as it seems on its face. Take a strawman hedonism, for a very simple example. Is a hedonist's pleasure maximized by encouraging other people to pursue -their- pleasure? Or would it be better served by convincing them to pursue other people's (a class of people of which our strawman hedonist is a member) pleasure?
It's not merely selfish moralities which suffer meta-moral problems. I've encountered a few near-Comtean altruists who will readily admit their morality makes them miserable; the idea that other people are worse off than them fills them with a deep guilt which they cannot resolve. If their goal is truly the happiness of others, spreading their moral system is a short-term evil. (It may be a long-term good, depending on how they do their accounting, but non-moral altruism isn't actually a rare quality, so I think an honest accounting would suggest their moral system doesn't add much additional altruism to the system, only a lot of guilt about the fact that not much altruistic action is taking place.)
Note: I use the word "altruism" here in its modern, non-Comtean sense. Altruism is that which benefits others.
Does your moral system make you unhappy, on the whole? Does it, like most moral systems, place a value on happiness? Would it make the average person less or more happy, if they and they alone adopted it? Are your expectations of the moral value of your moral system predicated on an unrealistic scenario of universal acceptance? Maybe your moral system isn't itself very moral.
Second: Do you think your moral system makes you a more moral person?
Does your moral system promote moral actions? What percentage of your actions concerning your morality are spent feeling good because you feel like you've effectively promoted your moral system, rather than promoting the values inherent in it?
Do you behave any differently than you would if you operated under a "common law" morality, such as social norms and laws? That is, does your ethical system make you behave differently than if you didn't possess it? Are you evaluating the merits of your moral system solely on how it answers hypothetical situations, rather than how it addresses your day-to-day life?
Does your moral system promote behaviors you're uncomfortable with and/or could not actually do, such as pushing people in the way of trolleys to save more people?
Third: Does your moral system promote morality, or itself as a moral system?
Is the primary contribution of your moral system to your life adding outrage that other people -don't- follow your moral system? Do you feel that people who follow other moral systems are immoral even if they end up behaving in exactly the same way you do? Does your moral system imply complex calculations which aren't actually taking place? Is the primary purpose of your moral system encouraging moral behavior, or defining what the moral behavior would have been after the fact?
Considered as a meme or memeplex, does your moral system seem better suited to propagating itself than to encouraging morality? Do you think "The primary purpose of this moral system is ensuring that these morals continue to exist" could be an accurate description of your moral system? Does the moral system promote the belief that people who don't follow it are completely immoral?
Fourth: Is the major purpose of your morality morality itself?
This is a rather tough question to elaborate with further questions, so I suppose I should try to clarify a bit first: Take a strawman utilitarianism where "utility" -really is- what the morality is all about, where somebody has painstakingly gone through and assigned utility points to various things (this is kind of common in game-based moral systems, where you're just accumulating some kind of moral points, positive or negative). Or imagine (tough, I know) a religious morality where the sole objective of the moral system is satisfying God's will. That is, does your moral system define morality to be about something abstract and immeasurable, defined only in the context of your moral system? Is your moral system a tautology, which must be accepted to even be meaningful?
This one can be difficult to identify from the inside, because to some extent -all- human morality is tautological; you have to identify it with respect to other moralities, to see if it's a unique island of tautology, or whether it applies to human moral concerns in the general case. With that in mind, when you argue with other people about your ethical system, do they -always- seem to miss the point? Do they keep trying to reframe moral questions in terms of other moral systems? Do they bring up things which have nothing to do with (your) morality?
Related: Pinpointing Utility
Let's go for lunch at the Hypothetical Diner; I have something I want to discuss with you.
We will pick our lunch from the set of possible orders, and we will recieve a meal drawn from the set of possible meals,
Speaking in general, each possible order has an associated probability distribution over
O. The Hypothetical Diner takes care to simplify your analysis; the probability distribution is trivial; you always get exactly what you ordered.
Again to simplify your lunch, the Hypothetical Diner offers only two choices on the menu: the Soup, and the Bagel.
To then complicate things so that we have something to talk about, suppose there is some set
M of ways other things could be that may affect your preferences. Perhaps you have sore teeth on some days.
Suppose for the purposes of this hypothetical lunch date that you are VNM rational. Shocking, I know, but the hypothetical results are clear: you have a utility function,
U. The domain of the utility function is the product of all the variables that affect your preferences (which meal, and whether your teeth are sore):
U: M x O -> utility.
In our case, if your teeth are sore, you prefer the soup, as it is less painful. If your teeth are not sore, you prefer the bagel, because it is tastier:
U(sore & soup) > U(sore & bagel) U(~sore & soup) < U(~sore & bagel)
Your global utility function can be partially applied to some m in M to get an "object-level" utility function
U_m: O -> utility. Note that the restrictions of U made in this way need not have any resemblance to each other; they are completely separate.
It is convenient to think about and define these restricted "utility function patches" separately. Let's pick some units and datums so we can get concrete numbers for our utilities:
U_sore(soup) = 1 ; U_sore(bagel) = 0 U_unsore(soup) = 0 ; U_unsore(bagel) = 1
Those are separate utility functions, now, so we could pick units and datum seperately. Because of this, the sore numbers are totally incommensurable to the unsore numbers. *Don't try to comapre them between the UF's or you will get type-poisoning. The actual numbers are just a straightforward encoding of the preferences mentioned above.
What if we are unsure about where we fall in M? That is, we won't know whether our teeth are sore until we take the first bite. That is, we have a probability distribution over M. Maybe we are 70% sure that your teeth won't hurt you today. What should you order?
Well, it's usually a good idea to maximize expected utility:
EU(soup) = 30%*U(sore&soup) + 70%*U(~sore&soup) = ??? EU(bagel) = 30%*U(sore&bagel) + 70%*U(~sore&bagel) = ???
Suddenly we need those utility function patches to be commensuarable, so that we can actually compute these, but we went and defined them separately, darn. All is not lost though, recall that they are just restrictions of a global utility function to particular soreness-circumstance, with some (positive) linear transforms,
f_m, thrown in to make the numbers nice:
f_sore(U(sore&soup)) = 1 ; f_sore(U(sore&bagel)) = 0 f_unsore(U(~sore&soup)) = 0 ; f_unsore(U(~sore&bagel)) = 1
At this point, it's just a bit of clever function-inverting and all is dandy. We can pick some linear transform
g to be canonical, and transform all the utility function patches into that basis. So for all m, we can get g(U(m & o)) by inverting the
f_m and then applying
g.U(sore & x) = (g.inv(f_sore).f_sore)(U(sore & x)) = k_sore*U_sore(x) + c_sore g.U(~sore & x) = (g.inv(f_unsore).f_unsore)(U(~sore & x)) = k_unsore*U_unsore(x) + c_unsore
. to represent composition of those transforms. I hope that's not too confusing.)
Linear transforms are really nice; all the inverting and composing collapses down to a scale
k and an offset
c for each utility function patch. Now we've turned our bag of utility function patches into a utility function quilt! One more bit of math before we get back to deciding what to eat:
EU(x) = P(sore) *(k_sore *U_sore(x) + c_sore) + (1-P(sore))*(k_unsore*U_unsore(x) + c_unsore)
Notice that the terms involving
c_m do not involve
x, meaning that the
c_m terms don't affect our decision, so we can cancel them out and forget they ever existed! This is only true because I've implicitly assumed that P(m) does not depend on our actions. If it did, like if we could go to the dentist or take some painkillers, then it would be
P(m | x) and
c_m would be relevent in the whole joint decision.
We can define the canonical utility basis
g to be whatever we like (among positive linear transforms); for example, we can make it equal to
f_sore so that we can at least keep the simple numbers from
U_sore. Then we throw all the
c_ms away, because they don't matter. Then it's just a matter of getting the remaining
Ok, sorry, those last few paragraphs were rather abstract. Back to lunch. We just need to define these mysterious scaling constants and then we can order lunch. There is only one left;
k_unsore. In general there will be
n is the size of
M. I think the easiest way to approach this is to let
k_unsore = 1/5 and see what that implies:
g.U(sore & soup) = 1 ; g.U(sore & bagel) = 0 g.U(~sore & soup) = 0 ; g.U(~sore & bagel) = 1/5 EU(soup) = (1-P(~sore))*1 = 0.3 EU(bagel) = P(~sore)*k_unsore = 0.14 EU(soup) > EU(bagel)
After all the arithmetic, it looks like if
k_unsore = 1/5, even if we expect you to have nonsore teeth with
P(sore) = 0.3, we are unsure enough and the relative importance is big enough that we should play safe safe and go with the soup anyways. In general we would choose soup if
P(~sore) < 1/(k_unsore+1), or equivalently, if
k_unsore < (1-P(~sore)/P(~sore).
k is somehow the relative importance of possible preference stuctures under uncertainty. A smaller
k in this lunch example means that the tastiness of a bagel over a soup is small relative to the pain saved by eating the soup instead. With this intuition, we can see that
1/5 is a somewhat reasonable value for this scenario, and for example,
1 would not be, and neither would
What if we are uncertain about
k? Are we simply pushing the problem up some meta-chain? It turns out that no, we are not. Because
k is linearly related to utility, you can simply use its expected value if it is uncertain.
It's kind of ugly to have these
k_m's and these
U_m's, so we can just reason over the product
K x M instead of
K seperately. This is nothing weird, it just means we have more utility function patches (Many of which encode the exact same object-level preferences).
In the most general case, the utility function patches in
KxM are the space of all functions
O -> RR, with offset equivalence, but not scale equivalence (Sovereign utility functions have full linear-transform equivalence, but these patches are only equivalent under offset). Remember, though, that these are just restricted patches of a single global utility function.
So what is the point of all this? Are we just playing in the VNM sandbox, or is this result actually interesting for anything besides sore teeth?
Perhaps Moral/Preference Uncertainty? I didn't mention it until now because it's easier to think about lunch than a philosophical minefield, but it is the point of this post. Sorry about that. Let's conclude with everything restated in terms of moral uncertainty.
If we have:
A set of object-level outcomes
A set of "epiphenomenal" (outside of
O) 'moral' outcomes
A probability distribution over
M, possibly correlated with uncertainty about
O, but not in a way that allows our actions to influence uncertainty over
M(that is, assuming moral facts cannot be changed by your actions.),
A utility function over
Ofor each possible value of
M, (these can be arbitrary VNM-rational moral theories, as long as they share the same object-level),
And we wish to be VNM rational over whatever uncertainty we have
then we can quilt together a global utility function
U: (M,K,O) -> RR where and
U(m,k,o) = k*U_m(o) so that
EU(o) is the sum of all
P(m)*E(k | m)*U_m(o)
Somehow this all seems like legal VNM.
So. Just the possible object-level preferences and a probability distribution over those is not enough to define our behaviour. We need to know the scale for each so we know how to act when uncertain. This is analogous to the switch from ordinal preferences to interval preferences when dealing with object-level uncertainty.
Now we have a well-defined framework for reasoning about preference uncertainty, if all our possible moral theories are VNM rational, moral facts are immutable, and we have a joint probability distribution over
In particular, updating your moral beliefs upon hearing new arguments is no longer a mysterious dynamic, it is just a bayesian update over possible moral theories.
This requires a "moral prior" that corellates moral outcomes and their relative scales to the observable evidence. In the lunch example, we implicitly used such a moral prior to update on observable thought experiments and conclude that
1/5 was a plausible value for
Moral evidence is probably things like preference thought-experiments, neuroscience and physics results, etc. The actual model for this, and discussion about the issues with defining and reasoning on such a prior are outside the scope of this post.
This whole argument couldn't prove its way out of a wet paper bag, and is merely suggestive. Bits and peices may be found incorrect, and formalization might change things a bit.
This framework requires that we have already worked out the outcome-space
O (which we haven't), have limited our moral confusion to a set of VNM-rational moral theories over
O (which we haven't), and have defined a "Moral Prior" so we can have a probability distribution over moral theories and their wieghts (which we haven't).
Nonetheless, we can sometimes get those things in special limited cases, and even in the general case, having a model for moral uncertainty and updating is a huge step up from the terrifying confusion I (and everyone I've talked to) had before working this out.
A few days ago I was rereading one of my favourite graphic novels. In it the supervillain commits mass murder to prevent nuclear war - he kills millions to save billions. This got me thinking about how a lot of LessWrong/Effective Altruism people approach existential risks (xrisks). An existential risk is one that threatens the premature extinction of Earth-originating intelligent life or the permanent and drastic destruction of its potential for desirable future development (Bostrom 2002). I'm going to point out an implication of this approach, show how this conflicts with a number of intuitions, and then try to clarify the conflict.
If murder would reduce xrisk, one should commit the murder. The argument for this is that compared to billions or even trillions of future people, and/or the amount of valuable things they could instantiate (by experiencing happiness or pleasure, performing acts of kindness, creating great artworks, etc) the importance of one present person, and/or the badness of commiting (mass) murder is quite small. The large number on the 'future' side outweighs or cancels the far smaller number on the 'present' side.
I can think of a number of scenarios in which murder of one or more people could quite clearly reduce existential risk, such as the people who know the location of some secret refuge
Indeed at the extreme it would seem that reducing xrisk would justify some truly terrible things, like a preemptive nuclear strike on a rogue country.
This implication does not just hold for simplistic act-utilitarians, or consequentialists more broadly - it affects any moral theory that accords moral weight to future people and doesn't forbid murder.
This implication is implicitly endorsed in a common choice many of us make between focusing our resources on xrisk reduction as opposed to extreme poverty reduction. This is sometimes phrased as being about choosing to save one life now or far more future lives. While bearing in mind some complications (such as the debate over doing vs allowing and the Doctrine of Double Effect), it seems that 'letting several people die from extreme poverty to try to reduce xrisk' is in an important way similar to 'killing several people to try to reduce xrisk'.
II. Simple Objection:
A natural reaction to this implication is that this is wrong, one shouldn't commit murder to reduce xrisk. To evade some simple objections let us assume that we can be highly sure that the (mass) murder will indeed reduce xrisk: maybe no-one will find out about the murder, or it won't open a position for someone even worse.
Let us try and explain this reaction, and offer an objection: The idea that we should commit (mass) murder conflicts with some deeply held intuitions, such as the intuition that one shouldn't kill, and the intuition that one shouldn't punish a wrong-doer before she/he commits a crime.
One response - the most prominent advocate of which is probably Peter Singer - is to cast doubt onto our intuitions. We may have these intuitions, but they may have been induced by various means i.e. by evolution or society. Racist views were common in past societies. Moreover there is some evidence that humans may have a evolutionary predisposition to be racist. Nevertheless we reject racism, and therefore (so the argument goes) we should reject a number of other intuitions. So perhaps we should reject the intuitions we have, shrug off the squeamishness and agree that (mass) murder to reduce xrisk is justified.
[NB: I'm unsure about how convincing this response is. Two articles in Philosophy and Public Affairs dispute Singer's argument (Berker 2009) (Kamm 2009). One must also take into account the problem of applying our everyday intuitions to very unusual situations - see 'How Outlandish Can Imaginary Cases Be?' (Elster 2011)]
The trope of the supervillain justifying his or her crimes by claiming it had to be done for 'the greater good' (or similar) is well established. Tv tropes calls it Utopia Justifies The Means. I find myself slightly troubled when my moral beliefs lead me to agree with fictional supervillains. Nevertheless, is the best option to bite the bullet and side with the supervillains?
III. Complex Objection:
Let us return to the fictional example with which we started. Part of the reason his act seems wrong is that, in real life, the supervillain's mass murder was not necessary to prevent nuclear war - the Cold War ended without large-scale direct conflict between the USA and USSR. This seems to point the way to (some) clarification.
I find my intuitions change when the risk seems higher. While I'm unsure that murder is the right answer in the examples given above, it seems clearer in a situation where the disaster is in the midst of occurring, and murder or mass murder is the only way to prevent an existential disaster. The hypothetical that works for me is imagining some incredibly virulent disease or 'grey-goo' nano-replicator that has swept over Australia and is about to spread, and the only way to stop it is a nuclear strike.
One possibility is that my having a different intuition is simply because the situation is similar to hypotheticals that seem more familiar, such as shooting a hostage-taker or terrorist if that was the only way to prevent loss of innocent life.
But I'd like to suggest that it perhaps reflects a problem with xrisks, that it is the idea of doing something awful for a very uncertain benefit. The problem is the uncertainty. If a (mass) murder would prevent an existential disaster, then one should do it, but when it merely reduces xrisk it is less clear. Perhaps there should be some sort of probability threshold - if one has good reason to think the probability is over certain limits (10%, 50%, etc) then one is justified in committing gradually more heinous acts.
In this post I've been trying to explain a troubling worry - to lay out my thinking - more than I have been trying to argue for or against an explicit claim. I have a problem with the claim that xrisk reduction is the most important task for humanity and/or me. On the one hand it seems convincing, yet on the other it seems to lead to some troubling implications - like justifying not focusing on extreme poverty reduction, or justifying (mass) murder.
Comments and criticism of the argument are welcomed. Also, I would be very interested in hearing people's opinions on this topic. Do you think that 'reducing xrisk' can justify murder? At what scale? Perhaps more importantly, does that bother you?
DISCLAIMER: I am in no way encouraging murder. Please do not commit murder.
Like many members of this community, reading the sequences has opened my eyes to a heavily neglected aspect of morality. Before reading the sequences I focused mostly on how to best improve people's wellbeing in the present and the future. However, after reading the sequences, I realized that I had neglected a very important question: In the future we will be able to create creatures with virtually any utility function imaginable. What sort of values should we give the creatures of the future? What sort of desires should they have, from what should they gain wellbeing?
Anyone familiar with the sequences should be familiar with the answer. We should create creatures with the complex values that human beings possess (call them "humane values"). We should avoid creating creatures with simple values that only desire to maximize one thing, like paperclips or pleasure.
It is important that future theories of ethics formalize this insight. I think we all know what would happen if we programmed an AI with conventional utilitarianism: It would exterminate the human race and replace them with creatures whose preferences are easier to satisfy (if you program it with preference utilitarianism) or creatures whom it is easier to make happy (if you program it with hedonic utilitarianism). It is important to develop a theory of ethics that avoids this.
Lately I have been trying to develop a modified utilitarian theory that formalizes this insight. My focus has been on population ethics. I am essentially arguing that population ethics should not just focus on maximizing welfare, it should also focus on what sort of creatures it is best to create. According to this theory of ethics, it is possible for a population with a lower total level of welfare to be better than a population with a higher total level of welfare, if the lower population consists of creatures that have complex humane values, while the higher welfare population consists of paperclip or pleasure maximizers. (I wrote a previous post on this, but it was long and rambling, I am trying to make this one more accessible).
One of the key aspects of this theory is that it does not necessarily rate the welfare of creatures with simple values as unimportant. On the contrary, it considers it good for their welfare to be increased and bad for their welfare to be decreased. Because of this, it implies that we ought to avoid creating such creatures in the first place, so it is not necessary to divert resources from creatures with humane values in order to increase their welfare.
My theory does allow the creation of simple-value creatures for two reasons. One is if the benefits they generate for creatures with humane values outweigh the harms generated when humane-value creatures must divert resources to improving their welfare (companion animals are an obvious example of this). The second is if creatures with humane values are about to go extinct, and the only choices are replacing them with simple value creatures, or replacing them with nothing.
So far I am satisfied with the development of this theory. However, I have hit one major snag, and would love it if someone else could help me with it. The snag is formulated like this:
1. It is better to create a small population of creatures with complex humane values (that has positive welfare) than a large population of animals that can only experience pleasure or pain, even if the large population of animals has a greater total amount of positive welfare. For instance, it is better to create a population of humans with 50 total welfare than a population of animals with 100 total welfare.
2. It is bad to create a small population of creatures with humane values (that has positive welfare) and a large population of animals that are in pain. For instance, it is bad to create a population of animals with -75 total welfare, even if doing so allows you to create a population of humans with 50 total welfare.
3. However, it seems like, if creating human beings wasn't an option, that it might be okay to create a very large population of animals, the majority of which have positive welfare, but the some of which are in pain. For instance, it seems like it would be good to create a population of animals where one section of the population has 100 total welfare, and another section has -75, since the total welfare is 25.
The problem is that this leads to what seems like a circular preference. If the population of animals with 100 welfare existed by itself it would be okay to not create it in order to create a population of humans with 50 welfare instead. But if the population we are talking about is the one in (3) then doing that would result in the population discussed in (2), which is bad.
My current solution to this dilemma is to include a stipulation that a population with negative utility can never be better than one with positive utility. This prevents me from having circular preferences about these scenarios. But it might create some weird problems. If population (2) is created anyway, and the humans in it are unable to help the suffering animals in any way, does that mean they have a duty to create lots of happy animals to get their population's utility up to a positive level? That seems strange, especially since creating the new happy animals won't help the suffering ones in any way. On the other hand, if the humans are able to help the suffering animals, and they do so by means of some sort of utility transfer, then it would be in the best interests to create lots of happy animals, to reduce the amount of utility each person has to transfer.
So far some of the solutions I am considering include:
1. Instead of focusing on population ethics, just consider complex humane values to have greater weight in utility calculations than pleasure or paperclips. I find this idea distasteful because it implies it would be acceptable to inflict large harms on animals for relatively small gains for humans. In addition, if the weight is not sufficiently great it could still lead to an AI exterminating the human race and replacing them with happy animals, since animals are easier to take care of and make happy than humans.
2. It is bad to create the human population in (2) if the only way to do so is to create a huge amount of suffering animals. But once both populations have been created, if the human population is unable to help the animal population, they have no duty to create as many happy animals as they can. This is because the two populations are not causally connected, and that is somehow morally significant. This makes some sense to me, as I don't think the existence of causally disconnected populations in the vast universe should bear any significance on my decision-making.
3. There is some sort of overriding consideration besides utility that makes (3) seem desirable. For instance, it might be bad for creatures with any sort of values to go extinct, so it is good to create a population to prevent this, as long as its utility is positive on the net. However, this would change in a situation where utility is negative, such as in (2).
4. Reasons to create a creature have some kind complex rock-paper-scissors-type "trumping" hierarchy. In other words, the fact that the humans have humane values can override the reasons to create a happy animals, but they cannot override the reason to not create suffering animals. The reasons to create happy animals, however, can override the reasons to not create suffering animals. I think that this argument might lead to inconsistent preferences again, but I'm not sure.
I find none of these solutions that satisfying. I would really appreciate it if someone could help me with solving this dilemma. I'm very hopeful about this ethical theory, and would like to see it improved.
*Update. After considering the issue some more, I realized that my dissatisfaction came from equivocating two different scenarios. I was considering the scenario, "Animals with 100 utility and animals with -75 utility are created, no humans are created at all" to be the same as the scenario "Humans with 50 utility and animals with -75 utility are created, then the humans (before the get to experience their 50 utility) are killed/harmed in order to create more animals without helping the suffering animals in any way" to be the same scenario. They are clearly not.
To make the analogy more obvious, imagine I was given a choice between creating a person who would experience 95 utility over the course of their life, or a person who would experience 100 utility over the course of their life. I would choose the person with 100 utility. But if the person destined to experience 95 utility already existed, but had not experienced the majority of that utility yet, I would oppose killing them and replacing them with the 100 utility person.
Or to put it more succinctly, I am willing to not create some happy humans to prevent some suffering animals from being created. And if the suffering animals and happy humans already exist I am willing to harm the happy humans to help the suffering animals. But if the suffering animals and happy humans already exist I am not willing to harm the happy humans to create some extra happy animals that will not help the existing suffering animals in any way.
Psychologists have discovered that while reading a book or story, people are prone to subconsciously adopt their behavior, thoughts, beliefs and internal responses to that of fictional characters as if they were their own.
Experts have dubbed this subconscious phenomenon ‘experience-taking,’ where people actually change their own behaviors and thoughts to match those of a fictional character that they can identify with.
Researcher from the Ohio State University conducted a series of six different experiments on about 500 participants, reporting in the Journal of Personality and Social Psychology, found that in the right situations, ‘experience-taking,’ may lead to temporary real world changes in the lives of readers.
They found that stories written in the first-person can temporarily transform the way readers view the world, themselves and other social groups.
I always wondered at how Christopher Hitchens (who, when he wasn't being a columnist, was a professor of English literature) went on and on about the power of fiction for revealing moral truths. This gives me a better idea of how people could imprint on well-written fiction. More so than, say, logically-reasoned philosophical tracts.
This article is, of course, a popularisation. Anyone have links to the original paper?
Edit: Gwern delivers (PDF): Kaufman, G. F., & Libby, L. K. (2012, March 26). "Changing Beliefs and Behavior Through Experience-Taking." Journal of Personality and Social Psychology. Advance online publication. doi: 10.1037/a0027525
This relates to my recent post on existence in many-worlds.
I care about possible people. My child, if I ever have one, is one of them, and it seems monstrous not to care about one's children. There are many distinct ways of being a possible person. 1)You can be causally connected to some actual people in the actual world in some histories of that world. 2)You can be a counterpart of an actual person on a distinct world without causal connections 3)You can be distinct from all actual individuals, and in a causally separate possible world. 4)You can be acausally connectable to actual people, but in distinct possible worlds.
Those 4 ways are not separate partitions without overlap, sometimes they overlap, and I don't believe they exhaust the scope of possible people. The most natural question to ask is "should we care equally about about all kinds of possible people". Some people are seriously studying this, and let us hope they give us accurate ways to navigate our complex universe. While we wait, some worries seem relevant:
1) The Multiverse is Sadistic Argument:
P1.1: If all possible people do their morally relevant thing (call it exist, if you will) and
P1.2: We cannot affect (causally or acausally) what is or not possible
C1.0: Then we cannot affect the morally relevant thing.
2) The Multiverse is Paralyzing (related)
P2.1: We have reason to care about X-Risk
P2.2: Worlds where X-Risk obtains are possible
P2.3: We have nearly as much reason to worry about possible non-actual1 worlds where X-risk obtains, as we have to actual worlds where it obtains.
P2.4: There are infinitely more worlds where X-risk obtains that are possible than there are actual1
C2.0: Infinitarian Paralysis
1Actual here means belonging to the same quantum branching history as you. If you think you have many quantum successors, all of them are actual, same for predecessors, and people who inhabit your Hubble volume.
3) Reality-Fluid Can't Be All That Is Left Argument
P3.1) If all possible people do their morally relevant thing
P3.2) The way in which we can affect what is possible is by giving some subsets of it more units of reality-fluid, or quantum measure
P3.3) In fact reality-fluid is a ratio, such as a percentage of successor worlds of kind A or kind B for a particular world W
P3.4) A possible World3 with 5% reality-fluid in relation to World1 is causally indistinguishable from itself with 5 times more reality-fluid 25% in relation to World2.
P3.5) The morally relevant thing, though by constitution qualitative, seems to be quantifiable, and what matters is it's absolute quantity, not any kind of ratio.
C3.1: From 3.2 and 3.3 -> We can actually affect only a quantity that is relative to our world, not an absolute quantity.
C3.2: From C3.1 and P 3.5 -> We can't affect the relevant thing.
C3.3: We ended up having to talk about reality fluid because decisions matter, and reality fluid is the thing that decision changes (from P3.4 we know it isn't causal structure). But if all that decision changes is some ratio between worlds, and what matters by P3.5 is not a ratio between worlds, we have absolutely no clue of what we are talking about when we talk about "the thing that matters" "what we should care about" and "reality fluid".
These arguments are here not as a perfectly logical and acceptable argument structure, but to at least induce nausea about talking about Reality-Fluid, Measure, Morally relevant things in many-worlds, Morally relevant people causally disconnected to us. Those are not things you can Taboo the word away and keep the substance around. The problem does not lie in the word 'Existence', or in the sentence 'X is morally relevant'. It seems to me that the service that that existence or reality used to play doesn't make sense anymore (if all possible worlds exist or if Mathematical Universe Hypothesis is correct). We attempted to keep it around as a criterial determinant for What Matters. Yet now all that is left is this weird ratio that just can't be what matters. Without a criterial determinant for mattering, we are left in a position that makes me think we should head back towards a causal approach to morality. But this is an opinion, not a conclusion.
Edit: This post is an argument against the conjunctive truth of two things, Many Worlds, and the way in which we think of What Matters. It seems that the most natural interpretation of it is that Many Worlds is true, and thus my argument is against our notion of What Matters. In fact my position lies more in the opposite side - our notion of What Matters is (strongly related to) What Matters, so Many Worlds are less likely.
let me suggest a moral axiom with apparently very strong intuitive support, no matter what your concept of morality: morality should exist. That is, there should exist creatures who know what is moral, and who act on that. So if your moral theory implies that in ordinary circumstances moral creatures should exterminate themselves, leaving only immoral creatures, or no creatures at all, well that seems a sufficient reductio to solidly reject your moral theory.
I agree strongly with the above quote, and I think most other readers will as well. It is good for moral beings to exist and a world with beings who value morality is almost always better than one where they do not. I would like to restate this more precisely as the following axiom: A population in which moral beings exist and have net positive utility, and in which all other creatures in existence also have net positive utility, is always better than a population where moral beings do not exist.
While the axiom that morality should exist is extremely obvious to most people, there is one strangely popular ethical system that rejects it: total utilitarianism. In this essay I will argue that Total Utilitarianism leads to what I will call the Genocidal Conclusion, which is that there are many situations in which it would be fantastically good for moral creatures to either exterminate themselves, or greatly limit their utility and reproduction in favor of the utility and reproduction of immoral creatures. I will argue that the main reason consequentialist theories of population ethics produce such obviously absurd conclusions is that they continue to focus on maximizing utility1 in situations where it is possible to create new creatures. I will argue that pure utility maximization is only a valid ethical theory for "special case" scenarios where the population is static. I will propose an alternative theory for population ethics I call "ideal consequentialism" or "ideal utilitarianism" which avoids the Genocidal Conclusion and may also avoid the more famous Repugnant Conclusion.
I will begin my argument by pointing to a common problem in population ethics known as the Mere Addition Paradox (MAP) and the Repugnant Conclusion. Most Less Wrong readers will already be familiar with this problem, so I do not think I need to elaborate on it. You may also be familiar with a even stronger variation called the Benign Addition Paradox (BAP). This is essentially the same as the MAP, except that each time one adds more people one also gives a small amount of additional utility to the people who already existed. One then proceeds to redistribute utility between people as normal, eventually arriving at the huge population where everyone's lives are "barely worth living." The point of this is to argue that the Repugnant Conclusion can be arrived at from "mere addition" of new people that not only doesn't harm the preexisting-people, but also one that benefits them.
The next step of my argument involves three slightly tweaked versions of the Benign Addition Paradox. I have not changed the basic logic of the problem, I have just added one small clarifying detail. In the original MAP and BAP it was not specified what sort of values the added individuals in population A+ held. Presumably one was meant to assume that they were ordinary human beings. In the versions of the BAP I am about to present, however, I will specify that the extra individuals added in A+ are not moral creatures, that if they have values at all they are values indifferent to, or opposed to, morality and the other values that the human race holds dear.
1. The Benign Addition Paradox with Paperclip Maximizers.
Let us imagine, as usual, a population, A, which has a large group of human beings living lives of very high utility. Let us then add a new population consisting of paperclip maximizers, each of whom is living a life barely worth living. Presumably, for a paperclip maximizer, this would be a life where the paperclip maximizer's existence results in at least one more paperclip in the world than there would have been otherwise.
Now, one might object that if one creates a paperclip maximizer, and then allows it to create one paperclip, the utility of the other paperclip maximizers will increase above the "barely worth living" level, which would obviously make this thought experiment nonalagous with the original MAP and BAP. To prevent this we will assume that each paperclip maximizer that is created has a slightly different values on what the ideal size, color, and composition of the paperclip they are trying to produce is. So the Purple 2 centimeter Plastic Paperclip Maximizer gains no addition utility from when the Silver Iron 1 centimeter Paperclip Maximizer makes a paperclip.
So again, let us add these paperclip maximizers to population A, and in the process give one extra utilon of utility to each preexisting person in A. This is a good thing, right? After all, everyone in A benefited, and the paperclippers get to exist and make paperclips. So clearly A+, the new population, is better than A.
Now let's take the next step, the transition from population A+ to population B. Take some of the utility from the human beings and convert it into paperclips. This is a good thing, right?
So let us repeat these steps adding paperclip maximizers and utility, and then redistributing utility. Eventually we reach population Z, where there is a vast amount of paperclip maximizers, a vast amount of many different kinds of paperclips, and a small amount of human beings living lives barely worth living.
Obviously Z is better than A, right? We should not fear the creation of a paperclip maximizing AI, but welcome it! Forget about things like high challenge, love, interpersonal entanglement, complex fun, and so on! Those things just don't produce the kind of utility that paperclip maximization has the potential to do!
Or maybe there is something seriously wrong with the moral assumptions behind the Mere Addition and Benign Addition Paradoxes.
But you might argue that I am using an unrealistic example. Creatures like Paperclip Maximizers may be so far removed from normal human experience that we have trouble thinking about them properly. So let's replay the Benign Addition Paradox again, but with creatures we might actually expect to meet in real life, and we know we actually value.
2. The Benign Addition Paradox with Non-Sapient Animals
You know the drill by now. Take population A, add a new population to it, while very slightly increasing the utility of the original population. This time let's have it be some kind animal that is capable of feeling pleasure and pain, but is not capable of modeling possible alternative futures and choosing between them (in other words, it is not capable of having "values" or being "moral"). A lizard or a mouse, for example. Each one feels slightly more pleasure than pain in its lifetime, so it can be said to have a life barely worth living. Convert A+ to B. Take the utilons that the human beings are using to experience things like curiosity, beatitude, wisdom, beauty, harmony, morality, and so on, and convert it into pleasure for the animals.
We end up with population Z, with a vast amount of mice or lizards with lives just barely worth living, and a small amount of human beings with lives barely worth living. Terrific! Why do we bother creating humans at all! Let's just create tons of mice and inject them full of heroin! It's a much more efficient way to generate utility!
3. The Benign Addition Paradox with Sociopaths
What new population will we add to A this time? How about some other human beings, who all have anti-social personality disorder? True, they lack the key, crucial value of sympathy that defines so much of human behavior. But they don't seem to miss it. And their lives are barely worth living, so obviously A+ has greater utility than A. If given a chance the sociopaths will reduce the utility of other people to negative levels, but let's assume that that is somehow prevented in this case.
Eventually we get to Z, with a vast population of sociopaths and a small population of normal human beings, all living lives just barely worth living. That has more utility, right? True, the sociopaths place no value on things like friendship, love, compassion, empathy, and so on. And true, the sociopaths are immoral beings who do not care in the slightest about right and wrong. But what does that matter? Utility is being maximized, and surely that is what population ethics is all about!
Let's suppose an asteroid is approaching each of the four population Zs discussed before. It can only be deflected by so much. Your choice is, save the original population of humans from A, or save the vast new population. The choice is obvious. In 1, 2, and 3, each individual has the same level utility, so obviously we should choose which option saves a greater number of individuals.
Bam! The asteroid strikes. The end result in all four scenarios is a world in which all the moral creatures are destroyed. It is a world without the many complex values that human beings possess. Each world, for the most part, lack things like complex challenge, imagination, friendship, empathy, love, and the other complex values that human beings prize. But so what? The purpose of population ethics is to maximize utility, not silly, frivolous things like morality, or the other complex values of the human race. That means that any form of utility that is easier to produce than those values is obviously superior. It's easier to make pleasure and paperclips than it is to make eudaemonia, so that's the form of utility that ought to be maximized, right? And as for making sure moral beings exist, well that's just ridiculous. The valuable processing power they're using to care about morality could be being used to make more paperclips or more mice injected with heroin! Obviously it would be better if they died off, right?
I'm going to go out on a limb and say "Wrong."
Is this realistic?
Now, to fair, in the Overcoming Bias page I quoted, Robin Hanson also says:
I’m not saying I can’t imagine any possible circumstances where moral creatures shouldn’t die off, but I am saying that those are not ordinary circumstances.
Maybe the scenarios I am proposing are just too extraordinary. But I don't think this is the case. I imagine that the circumstances Robin had in mind were probably something like "either all moral creatures die off, or all moral creatures are tortured 24/7 for all eternity."
Any purely utility-maximizing theory of population ethics that counts both the complex values of human beings, and the pleasure of animals, as "utility" should inevitably draw the conclusion that human beings ought to limit their reproduction to the bare minimum necessary to maintain the infrastructure to sustain a vastly huge population of non-human animals (preferably animals dosed with some sort of pleasure-causing drug). And if some way is found to maintain that infrastructure automatically, without the need for human beings, then the logical conclusion is that human beings are a waste of resources (as are chimps, gorillas, dolphins, and any other animal that is even remotely capable of having values or morality). Furthermore, even if the human race cannot practically be replaced with automated infrastructure, this should be an end result that the adherents of this theory should be yearning for.2 There should be much wailing and gnashing of teeth among moral philosophers that exterminating the human race is impractical, and much hope that someday in the future it will not be.
I call this the "Genocidal Conclusion" or "GC." On the macro level the GC manifests as the idea that the human race ought to be exterminated and replaced with creatures whose preferences are easier to satisfy. On the micro level it manifests as the idea that it is perfectly acceptable to kill someone who is destined to live a perfectly good and worthwhile life and replace them with another person who would have a slightly higher level of utility.
Population Ethics isn't About Maximizing Utility
I am going to make a rather radical proposal. I am going to argue that the consequentialist's favorite maxim, "maximize utility," only applies to scenarios where creating new people or creatures is off the table. I think we need an entirely different ethical framework to describe what ought to be done when it is possible to create new people. I am not by any means saying that "which option would result in more utility" is never a morally relevant consideration when deciding to create a new person, but I definitely think it is not the only one.3
So what do I propose as a replacement to utility maximization? I would argue in favor of a system that promotes a wide range of ideals. Doing some research, I discovered that G. E. Moore had in fact proposed a form of "ideal utilitarianism" in the early 20th century.4 However, I think that "ideal consequentialism" might be a better term for this system, since it isn't just about aggregating utility functions.
What are some of the ideals that an ideal consequentialist theory of population ethics might seek to promote? I've already hinted at what I think they are: Life, consciousness, and activity; health and strength; pleasures and satisfactions of all or certain kinds; happiness, beatitude, contentment, etc.; truth; knowledge and true opinions of various kinds, understanding, wisdom... mutual affection, love, friendship, cooperation; all those other important human universals, plus all the stuff in the Fun Theory Sequence. When considering what sort of creatures to create we ought to create creatures that value those things. Not necessarily, all of them, or in the same proportions, for diversity is an important ideal as well, but they should value a great many of those ideals.
Now, lest you worry that this theory has any totalitarian implications, let me make it clear that I am not saying we should force these values on creatures that do not share them. Forcing a paperclip maximizer to pretend to make friends and love people does not do anything to promote the ideals of Friendship and Love. Forcing a chimpanzee to listen while you read the Sequences to it does not promote the values of Truth and Knowledge. Those ideals require both a subjective and objective component. The only way to promote those ideals is to create a creature that includes them as part of its utility function and then help it maximize its utility.
I am also certainly not saying that there is never any value in creating a creature that does not possess these values. There are obviously many circumstances where it is good to create nonhuman animals. There may even be some circumstances where a paperclip maximizer could be of value. My argument is simply that it is most important to make sure that creatures who value these various ideals exist.
I am also not suggesting that it is morally acceptable to casually inflict horrible harms upon a creature with non-human values if we screw up and create one by accident. If promoting ideals and maximizing utility are separate values then it may be that once we have created such a creature we have a duty to make sure it lives a good life, even if it was a bad thing to create it in the first place. You can't unbirth a child.5
It also seems to me that in addition to having ideals about what sort of creatures should exist, we also have ideals about how utility ought to be concentrated. If this is the case then ideal consequentialism may be able to block some forms of the Repugnant Conclusion, even if situations where the only creatures whose creation is being considered are human beings. If it is acceptable to create humans instead of paperclippers, even if the paperclippers would have higher utility, it may also be acceptable to create ten humans with a utility of ten each instead of a hundred humans with a utility of 1.01 each.
Why Did We Become Convinced that Maximizing Utility was the Sole Good?
Population ethics was, until comparatively recently, a fallow field in ethics. And in situations where there is no option to increase the population, maximizing utility is the only consideration that's really relevant. If you've created creatures that value the right ideals, then all that is left to be done is to maximize their utility. If you've created creatures that do not value the right ideals, there is no value to be had in attempting to force them to embrace those ideals. As I've said before, you will not promote the values of Love and Friendship by creating a paperclip maximizer and forcing it to pretend to love people and make friends.
So in situations where the population is constant, "maximize utility" is a decent approximation of the meaning of right. It's only when the population can be added to that morality becomes much more complicated.
Another thing to blame is human-centric reasoning. When people defend the Repugnant Conclusion they tend to point out that a life barely worth living is not as bad as it would seem at first glance. They emphasize that it need not be a boring life, it may be a life full of ups and downs where the ups just barely outweigh the downs. A life worth living, they say, is a life one would choose to live. Derek Parfit developed this idea to some extent by arguing that there are certain values that are "discontinuous" and that one needs to experience many of them in order to truly have a life worth living.
The Orthogonality Thesis throws all these arguments out the window. It is possible to create an intelligence to execute any utility function, no matter what it is. If human beings have all sorts of complex needs that must be fulfilled in order to for them lead worthwhile lives, then you could create more worthwhile lives by killing the human race and replacing them with something less finicky. Maybe happy cows. Maybe paperclip maximizers. Or how about some creature whose only desire is to live for one second and then die. If we created such a creature and then killed it we would reap huge amounts of utility, for we would have created a creature that got everything it wanted out of life!
How Intuitive is the Mere Addition Principle, Really?
I think most people would agree that morality should exist, and that therefore any system of population ethics should not lead to the Genocidal Conclusion. But which step in the Benign Addition Paradox should we reject? We could reject the step where utility is redistributed. But that seems wrong, most people seem to consider it bad for animals and sociopaths to suffer, and that it is acceptable to inflict at least some amount of disutilities on human beings to prevent such suffering.
It seems more logical to reject the Mere Addition Principle. In other words, maybe we ought to reject the idea that the mere addition of more lives-worth-living cannot make the world worse. And in turn, we should probably also reject the Benign Addition Principle. Adding more lives-worth-living may be capable of making the world worse, even if doing so also slightly benefits existing people. Fortunately this isn't a very hard principle to reject. While many moral philosophers treat it as obviously correct, nearly everyone else rejects this principle in day-to-day life.
Now, I'm obviously not saying that people's behavior in their day-to-day lives is always good, it may be that they are morally mistaken. But I think the fact that so many people seem to implicitly reject it provides some sort of evidence against it.
Take people's decision to have children. Many people choose to have fewer children than they otherwise would because they do not believe they will be able to adequately care for them, at least not without inflicting large disutilities on themselves. If most people accepted the Mere Addition Principle there would be a simple solution for this: have more children and then neglect them! True, the children's lives would be terrible while they were growing up, but once they've grown up and are on their own there's a good chance they may be able to lead worthwhile lives. Not only that, it may be possible to trick the welfare system into giving you money for the children you neglect, which would satisfy the Benign Addition Principle.
Yet most people choose not to have children and neglect them. And furthermore they seem to think that they have a moral duty not to do so, that a world where they choose to not have neglected children is better than one that they don't. What is wrong with them?
Another example is a common political view many people have. Many people believe that impoverished people should have fewer children because of the burden doing so would place on the welfare system. They also believe that it would be bad to get rid of the welfare system altogether. If the Benign Addition Principle were as obvious as it seems, they would instead advocate for the abolition of the welfare system, and encourage impoverished people to have more children. Assuming most impoverished people live lives worth living, this is exactly analogous to the BAP, it would create more people, while benefiting existing ones (the people who pay less taxes because of the abolition of the welfare system).
Yet again, most people choose to reject this line of reasoning. The BAP does not seem to be an obvious and intuitive principle at all.
The Genocidal Conclusion is Really Repugnant
There is nearly nothing repugnant than the Genocidal Conclusion. Pretty much the only way a line of moral reasoning could go more wrong would be concluding that we have a moral duty to cause suffering, as an end in itself. This means that it's fairly easy to counter any argument in favor of total utilitarianism that argues the alternative I am promoting has odd conclusions that do not fit some of our moral intuitions, while total utilitarianism does not. Is that conclusion more insane than the Genocidal Conclusion? If it isn't, total utilitarianism should still be rejected.
Ideal Consequentialism Needs a Lot of Work
I do think that Ideal Consequentialism needs some serious ironing out. I haven't really developed it into a logical and rigorous system, at this point it's barely even a rough framework. There are many questions that stump me. In particular I am not quite sure what population principle I should develop. It's hard to develop one that rejects the MAP without leading to weird conclusions, like that it's bad to create someone of high utility if a population of even higher utility existed long ago. It's a difficult problem to work on, and it would be interesting to see if anyone else had any ideas.
But just because I don't have an alternative fully worked out doesn't mean I can't reject Total Utilitarianism. It leads to the conclusion that a world with no love, curiosity, complex challenge, friendship, morality, or any other value the human race holds dear is an ideal, desirable world, if there is a sufficient amount of some other creature with a simpler utility function. Morality should exist, and because of that, total utilitarianism must be rejected as a moral system.
1I have been asked to note that when I use the phrase "utility" I am usually referring to a concept that is called "E-utility," rather than the Von Neumann-Morgenstern utility that is sometimes discussed in decision theory. The difference is that in VNM one's moral views are included in one's utility function, whereas in E-utility they are not. So if one chooses to harm oneself to help others because one believes that is morally right, one has higher VNM utility, but lower E-utility.
2There is a certain argument against the Repugnant Conclusion that goes that, as the steps of the Mere Addition Paradox are followed the world will lose its last symphony, its last great book, and so on. I have always considered this to be an invalid argument because the world of the RC doesn't necessarily have to be one where these things don't exist, it could be one where they exist, but are enjoyed very rarely. The Genocidal Conclusion brings this argument back in force. Creating creatures that can appreciate symphonies and great books is very inefficient compared to creating bunny rabbits pumped full of heroin.
3Total Utilitarianism was originally introduced to population ethics as a possible solution to the Non-Identity Problem. I certainly agree that such a problem needs a solution, even if Total Utilitarianism doesn't work out as that solution.
4I haven't read a lot of Moore, most of my ideas were extrapolated from other things I read on Less Wrong. I just mentioned him because in my research I noticed his concept of "ideal utilitarianism" resembled my ideas. While I do think he was on the right track he does commit the Mind Projection Fallacy a lot. For instance, he seems to think that one could promote beauty by creating beautiful objects, even if there were no creatures with standards of beauty around to appreciate them. This is why I am careful to emphasize that to promote ideals like love and beauty one must create creatures capable of feeling love and experiencing beauty.
5My tentative answer to the question Eliezer poses in "You Can't Unbirth a Child" is that human beings may have a duty to allow the cheesecake maximizers to build some amount of giant cheesecakes, but they would also have a moral duty to limit such creatures' reproduction in order to spare resources to create more creatures with humane values.
EDITED: To make a point about ideal consequentialism clearer, based on AlexMennen's criticisms.
Stuart has worked on further developing the orthogonality thesis, which gave rise to a paper, a non-final version of which you can see here: http://lesswrong.com/lw/cej/general_purpose_intelligence_arguing_the/
This post won't make sense if you haven't been through that.
Today we spent some time going over it and he accepted my suggestion of a minor amendment. Which best fits here.
Besides all the other awkward things that a moral convergentist would have to argue for, namely:
This argument generalises to other ways of producing the AI. Thus to deny the Orthogonality thesis is to assert that there is a goal system G, such that, among other things:
- There cannot exist any efficient real-world algorithm with goal G.
- If a being with arbitrarily high resources, intelligence, time and goal G, were to try design an efficient real-world algorithm with the same goal, it must fail.
- If a human society were highly motivated to design an efficient real-world algorithm with goal G, and were given a million years to do so along with huge amounts of resources, training and knowledge about AI, it must fail.
- If a high-resource human society were highly motivated to achieve the goals of G, then it could not do so (here the human society is seen as the algorithm).
- Same as above, for any hypothetical alien societies.
- There cannot exist any pattern of reinforcement learning that would train a highly efficient real-world intelligence to follow the goal G.
- There cannot exist any evolutionary or environmental pressures that would evolving highly efficient real world intelligences to follow goal G.
While doing some reading on philosophy I came across some interesting questions about the nature of having desires and preferences. One, do you still have preferences and desires when you are unconscious? Two, if you don't does this call into question the many moral theories that hold that having preferences and desires is what makes one morally significant, since mistreating temporarily unconscious people seems obviously immoral?
Philosophers usually discuss this question when debating the morality of abortion, but to avoid doing any mindkilling I won't mention that topic, except to say in this sentence that I won't mention it.
In more detail the issue is: A common, intuitive, and logical-seeming explanation for why it is immoral to destroy a typical human being, but not to destroy a rock, is that a typical human being has certain desires (or preferences or values, whatever you wish to call them, I'm using the terms interchangably) that they wish to fulfill, and destroying them would hinder the fulfillment of these desires. A rock, by contrast does not have any such desires so it is not harmed by being destroyed. The problem with this is that it also seems immoral to harm a human being who is asleep, or is in a temporary coma. And, on the face of it, it seems plausible to say that an unconscious person does not have any desires. (And of course it gets even weirder when considering far-out concepts like a brain emulator that is saved to a hard drive, but isn't being run at the moment)
After thinking about this it occurred to me that this line of reasoning could be taken further. If I am not thinking about my car at the moment, can I still be said to desire that it is not stolen? Do I stop having desires about things the instant my attention shifts away from them?
I have compiled a list of possible solutions to this problem, ranked in order from least plausible to most plausible.
1. One possibility would be to consider it immoral to harm a sleeping person because if they will have desires in the future, even if they don't now. I find this argument extremely implausible because it has some extremely bizarre implications, some of which may lead to insoluble moral contradictions. For instance, this argument could be used to argue that it is immoral to destroy skin cells because it is possible to use them to clone a new person, who will eventually grow up to have desires.
Furthermore, when human beings eventually gain the ability to build AIs that possess desires, this solution interacts with the orthogonality thesis in a catastrophic fashion. If it is possible to build an AI with any utility function, then for every potential AI one can construct, there is another potential AI that desires the exact opposite of that AI. That leads to total paralysis, since for every set potential set of desires we are capable of satisfying there is another potential set that would be horribly thwarted.
Lastly, this argument implies that you can, (and may be obligated to) help someone who doesn't exist, and never has existed, by satisfying their non-personal preferences, without ever having to bother with actually creating them. This seem strange, I can maybe see an argument for respecting the once-existant preferences of those who are dead, but respecting the hypothetical preferences of the never-existed seems absurd. It also has the same problems with the orthogonality thesis that I mentioned earlier.
2. Make the same argument as solution 1, but somehow define the categories more narrowly so that an unconscious person's ability to have desires in the future differs from that of an uncloned skin cell or an unbuilt AI. Michael Tooley has tried to do this by discerning between things that have the "possibility" of becoming a person with desires (i.e skin cells) and those that have the "capacity" to have desires. This approach has been criticized, and I find myself pessimistic about it because categories have a tendency to be "fuzzy" in real life and not have sharp borders.
3. Another solution may be that desires that one has had in the past continue to count, even when one is unconscious or not thinking about them. So it's immoral to harm unconscious people because before they were unconscious they had a desire not to be harmed, and it's immoral to steal my car because I desired that it not be stolen earlier when I was thinking about it.
I find this solution fairly convincing. The only major quibble I have with it is that it gives what some might consider a counter-intuitive result on a variation of the sleeping person question. Imagine a nano-factory manufacturers a sleeping person. This person is a new and distinct individual, and when they wake up they will proceed to behave as a typical human. This solution may suggest that it is okay to kill them before they wake up, since they haven't had any desires yet, which does seem odd.
4. Reject the claim that one doesn't have desires when one is unconscious, or when one is not thinking about a topic. The more I think about this solution, the more obvious it seems. Generally when I am rationally deliberating about whether or not I desire something I consider how many of my values and ideaks it fulfills. It seems like my list of values and ideals remains fairly constant, and that even if I am focusing my attention on one value at a time it makes sense to say that I still "have" the other values I am not focusing on at the moment.
Obviously I don't think that there's some portion of my brain where my "values" are stored in a neat little Excel spreadsheet. But they do seem to be a persistent part of its structure in some fashion. And it makes sense that they'd still be part of its structure when I'm unconscious. If they weren't, wouldn't my preferences change radically every time I woke up?
In other words, it's bad to harm an unconscious person because they have desires, preferences, values, whatever you wish to call them, that harming them would violate. And those values are a part of the structure of their mind that doesn't go away when they sleep. Skin cells and unbuilt AIs, by contrast, have no such values.
Now, while I think that explanation 4 resolves the issue of desires and unconsciousness best, I do think solution 3 has a great deal of truth to it as well (For instance, I tend to respect the final wishes of a dead person because they had desires in the past, even if they don't now). The solutions 3 and 4 are not incompatible at all, so one can believe in both of them.
I'm curious as to what people think of my possible solutions. Am I right about people still having something like desires in their brain when they are unconscious?
So morality has a lot to do with logic — indeed I have argued that moral reasoning is a type of applied logical reasoning — but it is not logic “all the way down,” it is anchored by certain contingent facts about humanity, bonoboness and so forth.
But, despite Yudkowsky’s confident claim, morality isn’t a matter of logic “all the way down,” because it has to start with some axioms, some brute facts about the type of organisms that engage in moral reasoning to begin with. Those facts don’t come from physics (though, like everything else, they better be compatible with all the laws of physics), they come from biology. A reasonable theory of ethics, then, can emerge only from a combination of biology (by which I mean not just evolutionary biology, but also cultural evolution) and logic.
Let's imagine a life extension drug has been discovered. One dose of this drug extends one's life by 49.99 years. This drug also has a mild cumulative effect, if it has been given to someone who has been dosed with it before it will extend their life by 50 years.
Under these constraints the most efficient way to maximize the amount of life extension this drug can produce is to give every dose to one individual. If there was one dose available for all seven-billion people alive on Earth then giving every person one dose would result in a total of 349,930,000,000 years of life gained. If one person was given all the doses a total of 349,999,999,999.99 years of life would be gained. Sharing the life extension drug equally would result in a net loss of almost 70 million years of life. If you're concerned about people's reaction to this policy then we could make it a big lottery, where every person on Earth gets a chance to gamble their dose for a chance at all of them.
Now, one could make certain moral arguments in favor of sharing the drug. I'll get to those later. However, it seems to me that gambling your dose for a chance at all of them isn't rational from a purely self-interested point of view either. You will not win the lottery. Your chances of winning this particular lottery are almost 7,000 times worse than your chances of winning the powerball jackpot. If someone gave me a dose of the drug, and then offered me a chance to gamble in this lottery, I'd accuse them of Pascal's mugging.
Here's an even scarier thought experiment. Imagine we invent the technology for whole brain emulation. Let "x" equal the amount of resources it takes to sustain a WBE through 100 years of life. Let's imagine that with this particular type of technology, it costs 10x to convert a human into a WBE and it costs 100x to sustain a biological human through the course of their natural life. Let's have the cost of making multiple copies of a WBE once they have been converted be close to 0.
Again, under these constraints it seems like the most effective way to maximize the amount of life extension done is to convert one person into a WBE, then kill everyone else and use the resources that were sustaining them to make more WBEs, or extend the life of more WBEs. Again, if we are concerned about people's reaction to this policy we could make it a lottery. And again, if I was given a chance to play in this lottery I would turn it down and consider it a form of Pascal's mugging.
I'm sure that most readers, like myself, would find these policies very objectionable. However, I have trouble finding objections to them from the perspective of classical utilitarianism. Indeed, most people have probably noticed that these scenarios are very similar to Nozick's "utility monster" thought experiment. I have made a list of possible objections to these scenarios that I have been considering:
1. First, let's deal with the unsatisfying practical objections. In the case of the drug example, it seems likely that a more efficient form of life extension will likely be developed in the future. In that case it would be better to give everyone the drug to sustain them until that time. However, this objection, like most practical ones, seems unsatisfying. It seems like there are strong moral objections to not sharing the drug.
Another pragmatic objection is that, in the case of the drug scenario, the lucky winner of the lottery might miss their friends and relatives who have died. And in the WBE scenario it seems like the lottery winner might get lonely being the only person on Earth. But again, this is unsatisfying. If the lottery winner were allowed to share their winnings with their immediate social circle, or if they were a sociopathic loner who cared nothing for others, it still seems bad that they end up killing everyone else on Earth.
2. One could use the classic utilitarian argument in favor of equality: diminishing marginal utility. However, I don't think this works. Humans don't seem to experience diminishing returns from lifespan in the same way they do from wealth. It's absurd to argue that a person who lives to the ripe old age of 60 generates less utility than two people who die at age 30 (all other things being equal). The reason the DMI argument works when arguing for equality of wealth is that people are limited in their ability to get utility from their wealth, because there is only so much time in the day to spend enjoying it. Extended lifespan removes this restriction, making a longer-lived person essentially a utility monster.
3. My intuitions about the lottery could be mistaken. It seems to me that if I was offered the possibility of gambling my dose of life extension drug with just one other person, I still wouldn't do it. If I understand probabilities correctly, then gambling for a chance at living either 0 or 99.99 additional years is equivalent to having a certainty of an additional 49.995 years of life, which is better than the certainty of 49.99 years of life I'd have if I didn't make the gamble. But I still wouldn't do it, partly because I'd be afraid I'd lose and partly because I wouldn't want to kill the person I was gambling with.
So maybe my horror at these scenarios is driven by that same hesitancy. Maybe I just don't understand the probabilities right. But even if that is the case, even if it is rational for me to gamble my dose with just one other person, it doesn't seem like the gambling would scale. I will not win the "lifetime lottery."
4. Finally, we have those moral objections I mentioned earlier. Utilitarianism is a pretty awesome moral theory under most circumstances. However, when it is applied to scenarios involving population growth and scenarios where one individual is vastly better at converting resources into utility than their fellows, it tends to produce very scary results. If we accept the complexity of value thesis (and I think we should), this suggests that there are other moral values that are not salient in the "special case" of scenarios with no population growth or utility monsters, but become relevant in scenarios where there are.
For instance, it may be that prioritarianism is better than pure utilitarianism, and in this case sharing the life extension method might be best because of the benefits it accords the least off. Or it may be (in the case of the WBE example) that having a large number of unique, worthwhile lives in the world is valuable because it produces experiences like love, friendship, and diversity.
My tentative guess at the moment is that there probably are some other moral values that make the scenarios I described morally suboptimal, even though they seem to make sense from a utilitarian perspective. However, I'm interested in what other people think. Maybe I'm missing something really obvious.
EDIT: To make it clear, when I refer to "amount of years added" I am assuming for simplicity's sake that all the years added are years that the person whose life is being extended wants to live and contain a large amount of positive experiences. I'm not saying that lifespan is exactly equivalent to utility. The problem I am trying to resolve is that it seems like the scenarios I've described seem to maximize the number of positive events it is possible for the people in the scenario to experience, even though they involve killing the majority of people involved. I'm not sure "positive experiences" is exactly equivalent to "utility" either, but it's likely a much closer match than lifespan.
What do I mean by "morality isn't logical"? I mean in the same sense that mathematics is logical but literary criticism isn't: the "reasoning" we use to think about morality doesn't resemble logical reasoning. All systems of logic, that I'm aware of, have a concept of proof and a method of verifying with high degree of certainty whether an argument constitutes a proof. As long as the logic is consistent (and we have good reason to think that many of them are), once we verify a proof we can accept its conclusion without worrying that there may be another proof that makes the opposite conclusion. With morality though, we have no such method, and people all the time make moral arguments that can be reversed or called into question by other moral arguments. (Edit: For an example of this, see these posts.)
Without being a system of logic, moral philosophical reasoning likely (or at least plausibly) doesn't have any of the nice properties that a well-constructed system of logic would have, for example, consistency, validity, soundness, or even the more basic property that considering arguments in a different order, or in a different mood, won't cause a person to accept an entirely different set of conclusions. For all we know, somebody trying to reason about a moral concept like "fairness" may just be taking a random walk as they move from one conclusion to another based on moral arguments they encounter or think up.
In a recent post, Eliezer said "morality is logic", by which he seems to mean... well, I'm still not exactly sure what, but one interpretation is that a person's cognition about morality can be described as an algorithm, and that algorithm can be studied using logical reasoning. (Which of course is true, but in that sense both math and literary criticism as well as every other subject of human study would be logic.) In any case, I don't think Eliezer is explicitly claiming that an algorithm-for-thinking-about-morality constitutes an algorithm-for-doing-logic, but I worry that the characterization of "morality is logic" may cause some connotations of "logic" to be inappropriately sneaked into "morality". For example Eliezer seems to (at least at one point) assume that considering moral arguments in a different order won't cause a human to accept an entirely different set of conclusions, and maybe this is why. To fight this potential sneaking of connotations, I suggest that when you see the phrase "morality is logic", remind yourself that morality isn't logical.
In a previous post, I argued that nihilism is often short changed around here. However I'm far from certain that it is correct, and in the mean time I think we should be careful not to discard our values one at a time by engaging in "selective nihilism" when faced with an ontological crisis, without even realizing that's what's happening. Karl recently reminded me of the post Timeless Identity by Eliezer Yudkowsky, which I noticed seems to be an instance of this.
As I mentioned in the previous post, our values seem to be defined in terms of a world model where people exist as ontologically primitive entities ruled heuristically by (mostly intuitive understandings of) physics and psychology. In this kind of decision system, both identity-as-physical-continuity and identity-as-psychological-continuity make perfect sense as possible values, and it seems humans do "natively" have both values. A typical human being is both reluctant to step into a teleporter that works by destructive scanning, and unwilling to let their physical structure be continuously modified into a psychologically very different being.
If faced with the knowledge that physical continuity doesn't exist in the real world at the level of fundamental physics, one might conclude that it's crazy to continue to value it, and this is what Eliezer's post argued. But if we apply this reasoning in a non-selective fashion, wouldn't we also conclude that we should stop valuing things like "pain" and "happiness" which also do not seem to exist at the level of fundamental physics?
In our current environment, there is widespread agreement among humans as to which macroscopic objects at time t+1 are physical continuations of which macroscopic objects existing at time t. We may not fully understand what exactly it is we're doing when judging such physical continuity, and the agreement tends to break down when we start talking about more exotic situations, and if/when we do fully understand our criteria for judging physical continuity it's unlikely to have a simple definition in terms of fundamental physics, but all of this is true for "pain" and "happiness" as well.
I suggest we keep all of our (potential/apparent) values intact until we have a better handle on how we're supposed to deal with ontological crises in general. If we convince ourselves that we should discard some value, and that turns out to be wrong, the error may be unrecoverable once we've lived with it long enough.
Imagine a robot that was designed to find and collect spare change around its owner's house. It had a world model where macroscopic everyday objects are ontologically primitive and ruled by high-school-like physics and (for humans and their pets) rudimentary psychology and animal behavior. Its goals were expressed as a utility function over this world model, which was sufficient for its designed purpose. All went well until one day, a prankster decided to "upgrade" the robot's world model to be based on modern particle physics. This unfortunately caused the robot's utility function to instantly throw a domain error exception (since its inputs are no longer the expected list of macroscopic objects and associated properties like shape and color), thus crashing the controlling AI.
According to Peter de Blanc, who used the phrase "ontological crisis" to describe this kind of problem,
Human beings also confront ontological crises. We should find out what cognitive algorithms humans use to solve the same problems described in this paper. If we wish to build agents that maximize human values, this may be aided by knowing how humans re-interpret their values in new ontologies.
I recently realized that a couple of problems that I've been thinking over (the nature of selfishness and the nature of pain/pleasure/suffering/happiness) can be considered instances of ontological crises in humans (although I'm not so sure we necessarily have the cognitive algorithms to solve them). I started thinking in this direction after writing this comment:
This formulation or variant of TDT requires that before a decision problem is handed to it, the world is divided into the agent itself (X), other agents (Y), and "dumb matter" (G). I think this is misguided, since the world doesn't really divide cleanly into these 3 parts.
What struck me is that even though the world doesn't divide cleanly into these 3 parts, our models of the world actually do. In the world models that we humans use on a day to day basis, and over which our utility functions seem to be defined (to the extent that we can be said to have utility functions at all), we do take the Self, Other People, and various Dumb Matter to be ontologically primitive entities. Our world models, like the coin collecting robot's, consist of these macroscopic objects ruled by a hodgepodge of heuristics and prediction algorithms, rather than microscopic particles governed by a coherent set of laws of physics.
For example, the amount of pain someone is experiencing doesn't seem to exist in the real world as an XML tag attached to some "person entity", but that's pretty much how our models of the world work, and perhaps more importantly, that's what our utility functions expect their inputs to look like (as opposed to, say, a list of particles and their positions and velocities). Similarly, a human can be selfish just by treating the object labeled "SELF" in its world model differently from other objects, whereas an AI with a world model consisting of microscopic particles would need to somehow inherit or learn a detailed description of itself in order to be selfish.
To fully confront the ontological crisis that we face, we would have to upgrade our world model to be based on actual physics, and simultaneously translate our utility functions so that their domain is the set of possible states of the new model. We currently have little idea how to accomplish this, and instead what we do in practice is, as far as I can tell, keep our ontologies intact and utility functions unchanged, but just add some new heuristics that in certain limited circumstances call out to new physics formulas to better update/extrapolate our models. This is actually rather clever, because it lets us make use of updated understandings of physics without ever having to, for instance, decide exactly what patterns of particle movements constitute pain or pleasure, or what patterns constitute oneself. Nevertheless, this approach hardly seems capable of being extended to work in a future where many people may have nontraditional mind architectures, or have a zillion copies of themselves running on all kinds of strange substrates, or be merged into amorphous group minds with no clear boundaries between individuals.
By the way, I think nihilism often gets short changed around here. Given that we do not actually have at hand a solution to ontological crises in general or to the specific crisis that we face, what's wrong with saying that the solution set may just be null? Given that evolution doesn't constitute a particularly benevolent and farsighted designer, perhaps we may not be able to do much better than that poor spare-change collecting robot? If Eliezer is worried that actual AIs facing actual ontological crises could do worse than just crash, should we be very sanguine that for humans everything must "add up to moral normality"?
To expand a bit more on this possibility, many people have an aversion against moral arbitrariness, so we need at a minimum a utility translation scheme that's principled enough to pass that filter. But our existing world models are a hodgepodge put together by evolution so there may not be any such sufficiently principled scheme, which (if other approaches to solving moral philosophy also don't pan out) would leave us with legitimate feelings of "existential angst" and nihilism. One could perhaps still argue that any current such feelings are premature, but maybe some people have stronger intuitions than others that these problems are unsolvable?
Do we have any examples of humans successfully navigating an ontological crisis? The LessWrong Wiki mentions loss of faith in God:
In the human context, a clear example of an ontological crisis is a believer’s loss of faith in God. Their motivations and goals, coming from a very specific view of life suddenly become obsolete and maybe even nonsense in the face of this new configuration. The person will then experience a deep crisis and go through the psychological task of reconstructing its set of preferences according the new world view.
But I don't think loss of faith in God actually constitutes an ontological crisis, or if it does, certainly not a very severe one. An ontology consisting of Gods, Self, Other People, and Dumb Matter just isn't very different from one consisting of Self, Other People, and Dumb Matter (the latter could just be considered a special case of the former with quantity of Gods being 0), especially when you compare either ontology to one made of microscopic particles or even less familiar entities.
But to end on a more positive note, realizing that seemingly unrelated problems are actually instances of a more general problem gives some hope that by "going meta" we can find a solution to all of these problems at once. Maybe we can solve many ethical problems simultaneously by discovering some generic algorithm that can be used by an agent to transition from any ontology to another?
(Note that I'm not saying this is the right way to understand one's real preferences/morality, but just drawing attention to it as a possible alternative to other more "object level" or "purely philosophical" approaches. See also this previous discussion, which I recalled after writing most of the above.)
I propose it is altruistic to be replaceable and therefore, those who strive to be altruistic should strive to be replaceable.
As far as I can Google, this does not seem to have been proposed before. LW should be a good place to discuss it. A community interested in rational and ethical behavior, and in how superintelligent machines may decide to replace mankind, should at least bother to refute the following argument.
Replaceability is "the state of being replaceable". It isn't binary. The price of the replacement matters: so a cookie is more replaceable than a big wedding cake. Adequacy of the replacement also makes a difference: a piston for an ancient Rolls Royce is less replaceable than one in a modern car, because it has to be hand-crafted and will be distinguishable. So something is more or less replaceable depending on the price and quality of its replacement.
Replaceability could be thought of as the inverse of the cost of having to replace something. Something that's very replaceable has a low cost of replacement, while something that lacks replaceability has a high (up to unfeasible) cost of replacement. The cost of replacement plays into Total Cost of Ownership, and everything economists know about that applies. It seems pretty obvious that replaceability of possessions is good, much like cheap availability is good.
Some things (historical artifacts, art pieces) are valued highly precisely because of their irreplacability. Although a few things could be said about the resale value of such objects, I'll simplify and contend these valuations are not rational.
The practical example
Anne manages the central database of Beth's company. She's the only one who has access to that database, the skillset required for managing it, and an understanding of how it all works; she has a monopoly to that combination.
This monopoly gives Anne control over her own replacement cost. If she works according to the state of the art, writes extensive and up-to-date documentation, makes proper backups etc she can be very replaceable, because her monopoly will be easily broken. If she refuses to explain what she's doing, creates weird and fragile workarounds and documents the database badly she can reduce her replaceability and defend her monopoly. (A well-obfuscated database can take months for a replacement database manager to handle confidently.)
So Beth may still choose to replace Anne, but Anne can influence how expensive that'll be for Beth. She can at least make sure her replacement needs to be shown the ropes, so she can't be fired on a whim. But she might go further and practically hold the database hostage, which would certainly help her in salary negotiations if she does it right.
This makes it pretty clear how Anne can act altruistically in this situation, and how she can act selfishly. Doesn't it?
The moral argument
To Anne, her replacement cost is an externality and an influence on the length and terms of her employment. To maximize the length of her employment and her salary, her replacement cost would have to be high.
To Beth, Anne's replacement cost is part of the cost of employing her and of course she wants it to be low. This is true for any pair of employer and employee: Anne is unusual only in that she has a great degree of influence on her replacement cost.
Therefore, if Anne documents her database properly etc, this increases her replaceability and constitutes altruistic behavior. Unless she values the positive feeling of doing her employer a favor more highly than she values the money she might make by avoiding replacement, this might even be true altruism.
Unless I suck at Google, replaceability doesn't seem to have been discussed as an aspect of altruism. The two reasons for that I can see are:
- replacing people is painful to think about
- and it seems futile as long as people aren't replaceable in more than very specific functions anyway.
But we don't want or get the choice to kill one person to save the life of five, either, and such practical improbabilities shouldn't stop us from considering our moral decisions. This is especially true in a world where copies, and hence replacements, of people are starting to look possible at least in principle.
- In some reasonably-near future, software is getting better at modeling people. We still don't know what makes a process intelligent, but we can feed a couple of videos and a bunch of psychological data points into a people modeler, extrapolate everything else using a standard population and the resulting model can have a conversation that could fool a four-year-old. The technology is already good enough for models of pets. While convincing models of complex personalities are at least another decade away, the tech is starting to become good enough for senile grandmothers.
Obviously no-one wants granny to die. But the kids would like to keep a model of granny, and they'd like to make the model before the Alzheimer's gets any worse, while granny is terrified she'll get no more visits to her retirement home.
What's the ethical thing to do here? Surely the relatives should keep visiting granny. Could granny maybe have a model made, but keep it to herself, for release only through her Last Will and Testament? And wouldn't it be truly awful of her to refuse to do that?
- Only slightly further into the future, we're still mortal, but cryonics does appear to be working. Unfrozen people need regular medical aid, but the technology is only getting better and anyway, the point is: something we can believe to be them can indeed come back.
Some refuse to wait out these Dark Ages; they get themselves frozen for nonmedical reasons, to fastforward across decades or centuries into a time when the really awesome stuff will be happening, and to get the immortality technologies they hope will be developed by then.
In this scenario, wouldn't fastforwarders be considered selfish, because they impose on their friends the pain of their absence? And wouldn't their friends mind it less if the fastforwarders went to the trouble of having a good model (see above) made first?
- On some distant future Earth, minds can be uploaded completely. Brains can be modeled and recreated so effectively that people can make living, breathing copies of themselves and experience the inability to tell which instance is the copy and which is the original.
Of course many adherents of soul theories reject this as blasphemous. A couple more sophisticated thinkers worry if this doesn't devalue individuals to the point where superhuman AIs might conclude that as long as copies of everyone are stored on some hard drive orbiting Pluto, nothing of value is lost if every meatbody gets devoured into more hardware. Bottom line is: Effective immortality is available, but some refuse it out of principle.
In this world, wouldn't those who make themselves fully and infinitely replaceable want the same for everyone they love? Wouldn't they consider it a dreadful imposition if a friend or relative refused immortality? After all, wasn't not having to say goodbye anymore kind of the point?
These questions haven't come up in the real world because people have never been replaceable in more than very specific functions. But I hope you'll agree that if and when people become more replaceable, that will be regarded as a good thing, and it will be regarded as virtuous to use these technologies as they become available, because it spares one's friends and family some or all of the cost of replacing oneself.
Replaceability as an altruist virtue
And if replaceability is altruistic in this hypothetical future, as well as in the limited sense of Anne and Beth, that implies replaceability is altruistic now. And even now, there are things we can do to increase our replaceability, i.e. to reduce the cost our bereaved will incur when they have to replace us. We can teach all our (valuable) skills, so others can replace us as providers of these skills. We can not have (relevant) secrets, so others can learn what we know and replace us as sources of that knowledge. We can endeavour to live as long as possible, to postpone the cost. We can sign up for cryonics. There are surely other things each of us could do to increase our replaceability, but I can't think of any an altruist wouldn't consider virtuous.
As an altruist, I conclude that replaceability is a prosocial, unselfish trait, something we'd want our friends to have, in other words: a virtue. I'd go as far as to say that even bothering to set up a good Last Will and Testament is virtuous precisely because it reduces the cost my bereaved will incur when they have to replace me. And although none of us can be truly easily replaceable as of yet, I suggest we honor those who make themselves replaceable, and are proud of whatever replaceability we ourselves attain.
So, how replaceable are you?
In Robert Nozick's famous "Utility Monster" thought experiment he proposes the idea of a creature that does not receive diminishing marginal utility from resource consumption, and argues that this poses a problem for utilitarian ethics. Why? Utilitarian ethics, while highly egalitarian in real life situations, does not place any intrinsic value on equality. The reason utilitarian ethics tend to favor equality is that human beings seem to experience diminishing returns when converting resources into utility. Egalitarianism, according to this framework, is good because sharing resources between people reduces the level of diminishing returns and maximizes the total amount of utility people generate, not because it's actually good for people to have equal levels of utility.
The problem the Utility Monster poses is that, since it does not receive diminishing marginal utility, there is no reason, under traditional utilitarian framework, to share resources between it and the other inhabitants of the world it lives in. It would be completely justified in killing other people and taking their things for itself, or enslaving them for its own benefit. This seems counter-intuitive to Nozick, and many other people.
There seem to be two possible reasons for this. One, of course, is that most people's intuitions are wrong in this particular case. The reason I am interesting in exploring, however, is the other one, namely that equality is valuable for its own sake, not just as a side effect of diminishing marginal utility.
Now, before I go any further I should clarify what I mean by "equality." There are many different types of equality, not all of which are compatible with each other. What I mean is equality of utility, everyone has the same level of satisfied preferences, happiness, and whatever else "utility" constitutes. This is not the same thing as fiscal equality, as some people may differ in their ability to convert money and resources into utility (people with horrible illnesses, for instance, are worse at doing so than the general population). It is also important to stress that "lifespan" should be factored in as part of the utility that is to be equalized (i.e. killing someone increases inequality). Otherwise one could achieve equality of utility by killing all the poor people.
So if equality is valuable for its own sake, how does one factor it into utilitarian calculations? It seems wrong to replace utility maximization with equality maximization. That would imply that a world where everyone had 10 utilons and a society where everyone had 100 utilons are morally identical, which seems wrong, to say the least.
What about making equality lexically prior to utility maximization? That seems just as bad. It would imply, among other things, that in a stratified world where some people have far greater levels of utility than others, that it would be morally right to take an action that would harm every single person in the world, as long as it hurt the best off slightly more than the worst off. That seems insanely wrong. The Utility Monster thought experiment already argues against making utility maximization lexically prior to equality.
So it seems like the best option would be to have maximizing utility and increasing equality as two separate values. How then, to trade one off against the other? If there is some sort of straight, one-to-one value then this doesn't do anything to dismiss the problem of the Utility Monster. A monster good enough at utility generation could simply produce so much utility that no amount of equality could equal its output.
The best possible solution I can see would be to have utility maximization and equality have diminishing returns relative to each other. This would mean that in a world with high equality, but low utility, raising utility would be more important, while in a world of low equality and high utility, establishing equality would be more important.
This solution deals with the utility monster fairly effectively. No matter how much utility the monster can generate, it is always better to share some of its resources with other people.
Now, you might notice that this doesn't eliminate every aspect of the utility monster problem. As long as the returns generated by utility maximization do not diminish to zero you can always posit an even more talented monster. And you can then argue that the society created by having that monster enslave the rest of the populace is better than one where a less talented monster shares with the rest of the populace. However, this new society would instantly become better if the new Utility Monster was forced to share its resources with the rest of the population.
This is a huge improvement over the old framework. Ordinary utility maximizing ethics would not only argue that a world where a Utility Monster enslaved everyone else might be a better world. They would argue that it was the optimal world, the best possible world given the constraints the inhabitants face. Under this new ethical framework, however, that is never the case. The optimal world, under any given level of constraints, is one where a utility monster shares with the rest of the population.
In other words, under this framework, if you were to ask, "Is it good for a utility monster to enslave the rest of the population?" the answer would always be "No."
Obviously the value of equality has many other aspects to be considered. For instance is it better described by traditional egalitarianism, or by prioritarianism? Values are often more complex than they first appear.
It also seems quite possible that there are other facets of value besides maximizing utility and equality of utility. For instance, total and average utilitarianism might be reconciled by making them two separate values that are both important. Other potential candidates include prioritarian concerns (if they are not included already), number of worthwhile lives (most people would consider a world full of people with excellent lives better than one inhabited solely by one ecstatic utility monster), consideration of prior-existing people, and perhaps many, many more. As with utility and equality, these values would have diminishing returns relative to each other, and an optimum society would be one where all receive some measure of consideration.
An aside. This next section is not directly related to the rest of the essay, but develops the idea in a direction I thought was interesting:
It seems to me that the value of equality could be the source of a local disagreement in population ethics. There are several people (Robin Hanson, most notably) who have argued that it would be highly desirable to create huge amounts of poor people with lives barely worth living, and that this may well be better than having a smaller, wealthier population. Many other people consider this to be a bad idea.
The unspoken assumption in this argument is that multiple lives barely worth living generate more utility than a single very excellent life. At first this seems like an obvious truth, based on the following chain of logic:
1. It is obviously wrong for Person A, who has a life barely worth living, to kill Person B, who also has a life barely worth living, and use B's property to improve their own life.
2. The only reason something is wrong is that it decreases the level of utility.
3. Therefore, killing Person B must decrease the level of utility.
4. Therefore, two lives barely worth living must generate more utility than a single excellent life.
However, if equality is valued for its own sake, then the reason it is wrong to kill Person B might be because of the vast inequality in various aspects of utility (lifespan, for instance) that their death would create between A and B.
This means that a society that has a smaller population living great lives might very well be generating a much larger amount of utility than a larger society whose inhabitants live lives barely worth living.
An article from the Wall Street Journal. The original title might be slightly mind-killing for some people, but I found it moderately interesting especially considering that many LessWrongers formed part of the data set for the study the article talks about and a large fraction of us identified as libertarian on the last survey.
Inside the Cold, Calculating Libertarian Mind
An individual's personality shapes his or her political ideology at least as much as circumstances, background and influences. That is the gist of a recent strand of psychological research identified especially with the work of Jonathan Haidt. The baffling (to liberals) fact that a large minority of working-class white people vote for conservative candidates is explained by psychological dispositions that override their narrow economic interests.
In his recent book "The Righteous Mind," Dr. Haidt confronted liberal bafflement and made the case that conservatives are motivated by morality just as liberals are, but also by a larger set of moral "tastes"—loyalty, authority and sanctity, in addition to the liberal tastes for compassion and fairness. Studies show that conservatives are more conscientious and sensitive to disgust but less tolerant of change; liberals are more empathic and open to new experiences.
But ideology does not have to be bipolar. It need not fall on a line from conservative to liberal. In a recently published paper, Ravi Iyer from the University of Southern California, together with Dr. Haidt and other researchers at the data-collection platform YourMorals.org, dissect the personalities of those who describe themselves as libertarian.
These are people who often call themselves economically conservative but socially liberal. They like free societies as well as free markets, and they want the government to get out of the bedroom as well as the boardroom. They don't see why, in order to get a small-government president, they have to vote for somebody who is keen on military spending and religion; or to get a tolerant and compassionate society they have to vote for a large and intrusive state.
The study collated the results of 16 personality surveys and experiments completed by nearly 12,000 self-identified libertarians who visited YourMorals.org. The researchers compared the libertarians to tens of thousands of self-identified liberals and conservatives. It was hardly surprising that the team found that libertarians strongly value liberty, especially the "negative liberty" of freedom from interference by others. Given the philosophy of their heroes, from John Locke and John Stuart Mill to Ayn Rand and Ron Paul, it also comes as no surprise that libertarians are also individualistic, stressing the right and the need for people to stand on their own two feet, rather than the duty of others, or government, to care for people.
Perhaps more intriguingly, when libertarians reacted to moral dilemmas and in other tests, they displayed less emotion, less empathy and less disgust than either conservatives or liberals. They appeared to use "cold" calculation to reach utilitarian conclusions about whether (for instance) to save lives by sacrificing fewer lives. They reached correct, rather than intuitive, answers to math and logic problems, and they enjoyed "effortful and thoughtful cognitive tasks" more than others do.
The researchers found that libertarians had the most "masculine" psychological profile, while liberals had the most feminine, and these results held up even when they examined each gender separately, which "may explain why libertarianism appeals to men more than women."
All Americans value liberty, but libertarians seem to value it more. For social conservatives, liberty is often a means to the end of rolling back the welfare state, with its lax morals and redistributive taxation, so liberty can be infringed in the bedroom. For liberals, liberty is a way to extend rights to groups perceived to be oppressed, so liberty can be infringed in the boardroom. But for libertarians, liberty is an end in itself, trumping all other moral values.
Dr. Iyer's conclusion is that libertarians are a distinct species—psychologically as well as politically.
A version of this article appeared September 29, 2012, on page C4 in the U.S. edition of The Wall Street Journal, with the headline: Inside the Cold, Calculating Libertarian Mind.
The original paper.
Understanding Libertarian Morality: The Psychological Roots of an Individualist Ideology
Abstract: Libertarians are an increasingly vocal ideological group in U.S. politics, yet they are understudied compared to liberals and conservatives. Much of what is known about libertarians is based on the writing of libertarian intellectuals and political leaders, rather than surveying libertarians in the general population. Across three studies, 15 measures, and a large web-based sample (N = 152,239), we sought to understand the morality of selfdescribed libertarians. Based on an intuitionist view of moral judgment, we focused on the underlying affective and cognitive dispositions that accompany this unique worldview. We found that, compared to liberals and conservatives, libertarians show 1) stronger endorsement of individual liberty as their foremost guiding principle and correspondingly weaker endorsement of other moral principles, 2) a relatively cerebral as opposed to emotional intellectual style, and 3) lower interdependence and social relatedness. Our findings add to a growing recognition of the role of psychological predispositions in the organization of political attitudes.
I was surprised to see the high number of moral realists on Less Wrong, so I thought I would bring up a (probably unoriginal) point that occurred to me a while ago.
Let's say that all your thoughts either seem factual or fictional. Memories seem factual, stories seem fictional. Dreams seem factual, daydreams seem fictional (though they might seem factual if you're a compulsive fantasizer). Although the things that seem factual match up reasonably well to the things that actually are factional, this isn't the case axiomatically. If deviating from this pattern is adaptive, evolution will select for it. This could result in situations like: the rule that pieces move diagonally in checkers seems fictional, while the rule that you can't kill people seems factual, even though they're both just conventions. (Yes, the rule that you can't kill people is a very good convention, and it makes sense to have heavy default punishments for breaking it. But I don't think it's different in kind from the rule that you must move diagonally in checkers.)
I'm not an expert, but it definitely seems as though this could actually be the case. Humans are fairly conformist social animals, and it seems plausible that evolution would've selected for taking the rules seriously, even if it meant using the fact-processing system for things that were really just conventions.
Another spin on this: We could see philosophy as the discipline of measuring, collating, and making internally consistent our intuitions on various philosophical issues. Katja Grace has suggested that the measurement of philosophical intuitions may be corrupted by the desire to signal on the part of the philosophy enthusiasts. Could evolutionary pressure be an additional source of corruption? Taking this idea even further, what do our intuitions amount to at all aside from a composite of evolved and encultured notions? If we're talking about a question of fact, one can overcome evolution/enculturation by improving one's model of the world, performing experiments, etc. (I was encultured to believe in God by my parents. God didn't drop proverbial bowling balls from the sky when I prayed for them, so I eventually noticed the contradiction in my model and deconverted. It wasn't trivial--there was a high degree of enculturation to overcome.) But if the question has no basis in fact, like the question of whether morals are "real", then genes and enculturation will wholly determine your answer to it. Right?
Yes, you can think about your moral intuitions, weigh them against each other, and make them internally consistent. But this is kind of like trying to add resolution back in to an extremely pixelated photo--just because it's no longer obviously "wrong" doesn't guarantee that it's "right". And there's the possibility of path-dependence--the parts of the photo you try to improve initially could have a very significant effect on the final product. Even if you think you're willing to discard your initial philosophical conclusions, there's still the possibility of accidentally destroying your initial intuitional data or enculturing yourself with your early results.
To avoid this possibility of path-dependence, you could carefully document your initial intuitions, pursue lots of different paths to making them consistent in parallel, and maybe even choose a "best match". But it's not obvious to me that your initial mix of evolved and encultured values even deserves this preferential treatment.
Currently, I disagree with what seems to be the prevailing view on Less Wrong that achieving a Really Good Consistent Match for our morality is Really Darn Important. I'm not sure that randomness from evolution and enculturation should be treated differently from random factors in the intuition-squaring process. It's randomness all the way through either way, right? The main reason "bad" consistent matches are considered so "bad", I suspect, is that they engender cognitive dissonance (e.g. maybe my current ethics says I should hack Osama Bin Laden to death in his sleep with a knife if I get the chance, but this is an extremely bad match for my evolved/encultured intuitions, so I experience a ton of cognitive dissonance actually doing this). But cognitive dissonance seems to me like just another aversive experience to factor in to my utility calculations.
Now that you've read this, maybe your intuition has changed and you're a moral anti-realist. But in what sense has your intuition "improved" or become more accurate?
I really have zero expertise on any of this, so if you have relevant links please share them. But also, who's to say that matters? In what sense could philosophers have "better" philosophical intuition? The only way I can think of for theirs to be "better" is if they've seen a larger part of the landscape of philosophical questions, and are therefore better equipped to build consistent philosophical models (example).
The following is a dialogue intended to illustrate what I think may be a serious logical flaw in some of the conclusions drawn from the famous Mere Addition Paradox.
EDIT: To make this clearer, the interpretation of the Mere Addition Paradox this post is intended to criticize is the belief that a world consisting of a large population full of lives barely worth living is the optimal world. That is, I am disagreeing with the idea that the best way for a society to use the resources available to it is to create as many lives barely worth living as possible. Several commenters have argued that another interpretation of the Mere Addition Paradox is that a sufficiently large population with a lower quality of life will always be better than a smaller population with a higher quality of life, even if such a society is far from optimal. I agree that my argument does not necessarily refute this interpretation, but think the other interpretation is common enough that it is worth arguing against.
EDIT: On the advice of some of the commenters I have added a shorter summary of my argument in non-dialogue form at the end. Since it is shorter I do not think it summarizes my argument as completely as the dialogue, but feel free to read it instead if pressed for time.
Bob: Hi, I'm with R&P cable. We're selling premium cable packages to interested customers. We have two packages to start out with that we're sure you love. Package A+ offers a larger selection of basic cable channels and costs $50. Package B offers a larger variety of exotic channels for connoisseurs, it costs $100. If you buy package A+, however, you'll get a 50% discount on B.
Alice: That's very nice, but looking at the channel selection, I just don't think that it will provide me with enough utilons.
Bob: Utilons? What are those?
Alice: They're the unit I use to measure the utility I get from something. I'm really good at shopping, so if I spend my money on the things I usually spend it on I usually get 1.5 utilons for every dollar I spend. Now, looking at your cable channels, I've calculated that I will get 10 utilons from buying Package A+ and 100 utilons from buying Package B. Obviously the total is 110, significantly less than the 150 utilons I'd get from spending $100 on other things. It's just not a good deal for me.
Bob: You think so? Well it so happens that I've met people like you in the past and have managed to convince them. Let me tell you about something called the "Mere Cable Channel Addition Paradox."
Alice: Alright, I've got time, make your case.
Bob: Imagine that the government is going to give you $50. Sounds like a good thing, right?
Alice: It depends on where it gets the $50 from. What if it defunds a program I think is important?
Bob: Let's say that it would defund a program that you believe is entirely neutral. The harms the program causes are exactly outweighed by the benefits it brings, leaving a net utility of zero.
Alice: I can't think of any program like that, but I'll pretend one exists for the sake of the argument. Yes, defunding it and giving me $50 would be a good thing.
Bob: Okay, now imagine the program's beneficiaries put up a stink, and demand the program be re-instituted. That would be bad for you, right?
Alice: Sure. I'd be out $50 that I could convert into 75 utilons.
Bob: Now imagine that the CEO of R&P Cable Company sleeps with an important senator and arranges a deal. You get the $50, but you have to spend it on Package A+. That would be better than not getting the money at all, right?
Alice: Sure. 10 utilons is better than zero. But getting to spend the $50 however I wanted would be best of all.
Bob: That's not an option in this thought experiment. Now, imagine that after you use the money you received to buy Package A+, you find out that the 50% discount for Package B still applies. You can get it for $50. Good deal, right?
Alice: Again, sure. I'd get 100 utilons for $50. Normally I'd only get 75 utilons.
Bob: Well, there you have it. By a mere addition I have demonstrated that a world where you have bought both Package A+ and Package B is better than one where you have neither. The only difference between the hypothetical world I imagined and the world we live in is that in one you are spending money on cable channels. A mere addition. Yet you have admitted that that world is better than this one. So what are you waiting for? Sign up for Package A+ and Package B!
And that's not all. I can keep adding cable packages to get the same result. The end result of my logic, which I think you'll agree is impeccable, is that you purchase Package Z, a package where you spend all the money other than that you need for bare subsistence on cable television packages.
Alice: That seems like a pretty repugnant conclusion.
Bob: It still follows from the logic. For every world where you are spending your money on whatever you have calculated generates the most utilons there exists another, better world where you are spending all your money on premium cable channels.
Alice: I think I found a flaw in your logic. You didn't perform a "mere addition." The hypothetical world differs from ours in two ways, not one. Namely, in this world the government isn't giving me $50. So your world doesn't just differ from this one in terms of how many cable packages I've bought, it also differs in how much money I have to buy them.
Bob: So can I interest you in a special form of the package? This one is in the form of a legally binding pledge. You pledge that if you ever make an extra $50 in the future you will use it to buy Package A+.
Alice: No. In the scenario you describe the only reason buying Package A+ has any value is that it is impossible to get utility out of that money any other way. If I just get $50 for some reason it's more efficient for me to spend it normally.
Bob: Are you sure? I've convinced a lot of people with my logic.
Alice: Like who?
Bob: Well, there were these two customers named Michael Huemer and Robin Hanson who both accepted my conclusion. They've both mortgaged their homes and started sending as much money to R&P cable as they can.
Alice: There must be some others who haven't.
Bob: Well, there was this guy named Derek Parfit who seemed disturbed by my conclusion, but couldn't refute it. The best he could do is mutter something about how the best things in his life would gradually be lost if he spent all his money on premium cable. I'm working on him though, I think I'll be able to bring him around eventually.
Alice: Funny you should mention Derek Parfit. It so happens that the flaw in your "Mere Cable Channel Addition Paradox" is exactly the same as the flaw in a famous philosophical argument he made, which he called the "Mere Addition Paradox."
Bob: Really? Do tell?
Alice: Parfit posited a population he called "A" which had a moderately large population with large amounts of resources, giving them a very high level of utility per person. Then he added a second population, which was totally isolated from the other population. How they were isolated wasn't important, although Parfit suggested maybe they were on separate continents and can't sail across the ocean or something like that. These people don't have nearly as many resources per person as the other population, so each person's level of utility is lower (their lack of resources is the only reason they have lower utility). However, their lives are still just barely worth living. He called the two populations "A+."
Parfit asked if "A+" was a better world than "A." He thought it was, since the extra people were totally isolated from the original population they weren't hurting anyone over there by existing. And their lives were worth living. Follow me so far?
Bob: I guess I can see the point.
Alice: Next Parfit posited a population called "B," which was the same as A+. except that the two populations had merged together. Maybe they got better at sailing across the ocean, it doesn't really matter how. The people share their resources. The result is that everyone in the original population had their utility lowered, while everyone in the second had it raised.
Parfit asked if population "B" was better than "A+" and argued that it was because it had a greater level of equality and total utility.
Bob: I think I see where this is going. He's going to keep adding more people, isn't he?
Alice: Yep. He kept adding more and more people until he reached population "Z," a vast population where everyone had so few resources that their lives were barely worth living. This, he argued, was a paradox, because he argued that most people would believe that Z is far worse than A, but he had made a convincing argument that it was better.
Bob: Are you sure that sharing their resources like that would lower the standard of living for the original population? Wouldn't there be economies of scale and such that would allow them to provide more utility even with less resources per person?
Alice: Please don't fight the hypothetical. We're assuming that it would for the sake of the argument.
Now, Parfit argued that this argument led to the "Repugnant Conclusion," the idea that the best sort of world is one with a large population with lives barely worth living. That confers on people a duty to reproduce as often as possible, even if doing so would lower the quality of their and everyone else's lives.
He claimed that the reason his argument showed this was that he had conducted "mere addition." The populations in his paradox differed in no way other than their size. By merely adding more people he had made the world "better," even if the level of utility per person plummetted. He claimed that "For every population, A, with a high average level of utility there exists another, better population, B, with more people and a lower average level of utility."
Do you see the flaw in Parfit's argument?
Bob: No, and that kind of disturbs me. I have kids, and I agree that creating new people can add utility to the world. But it seems to me that it's also important to enhance the utility of the people who already exist.
Alice: That's right. Normal morality tells us that creating new people with lives worth living and enhancing the utility of people that already exist are both good things to use resources on. Our common sense tells us that we should spend resources on both those things. The disturbing thing about the Mere Addition Paradox is that it seems at first glance to indicate that that's not true, that we should only devote resources to creating more people with barely worthwhile lives. I don't agree with that, of course.
Bob: Neither do I. It seems to me that having a large number of worthwhile lives and a high average utility are both good things and that we should try to increase them both, not just maximize one.
Alice: You're right, of course. But don't say "having a high average utility." Say "use resources to increase the utility of people who already exist."
Bob: What's the difference? They're the same thing, aren't they?
Alice: Not quite. There are other ways to increase average utility than enhancing the utility of existing people. You could kill all the depressed people, for instance. Plus, if there was a world where everyone was tortured 24 hours a day, you could increase average utility by creating some new people who are only tortured 23 hours a day.
Bob: That's insane! Who could possibly be that literal-minded?
Alice: You'd be surprised. The point is, a better way to phrase it is "use resources to increase the utility of people who already exist," not "increase average utility." Of course, that still leaves some stuff out, like the fact that it's probably better to increase everyone's utility equally, rather than focus on just one person. But it doesn't lead to killing depressed people, or creating slightly less tortured people in a Hellworld.
Bob: Okay, so what I'm trying to say is that resources should be used to create people, and to improve people's lives. Also equality is good. And that none of these things should completely eclipse the other, they're each too valuable to maximize just one. So a society that increases all of those values should be considered more efficient at generating value than a society that just maximizes one value. Now that we're done getting our terminology straight, will you tell me what Parfit's mistake was?
Alice: Population "A" and population "A+" differ in two ways, not one. Think about it. Parfit is clear that the extra people in "A+" do not harm the existing people when they are added. That means they do not use any of the original population's resources. So how do they manage to live lives worth living? How are they sustaining themselves?
Bob: They must have their own resources. To use Parfit's example of continents separated by an ocean; each continent must have its own set of resources.
Alice: Exactly. So "A+" differs from "A" both in the size of its population, and the amount of resources it has access to. Parfit was not "merely adding" people to the population. He was also adding resources.
Bob: Aren't you the one who is fighting the hypothetical now?
Alice: I'm not fighting the hypothetical. Fighting the hypothetical consists of challenging the likelihood of the thought experiment happening, or trying to take another option than the ones presented. What I'm doing is challenging the logical coherence of the hypothetical. One of Parfit's unspoken premises is that you need some resources to live a life worth living, so by adding more worthwhile lives he's also implicitly adding resources. If he had just added some extra people to population A without giving them their own continent full of extra resources to live on then "A+" would be worse than "A."
Bob: So the Mere Addition Paradox doesn't confer on us a positive obligation to have as many children as possible, because the amount of resources we have access to doesn't automatically grow with them. I get that. But doesn't it imply that as soon as we get some more resources we have a duty to add some more people whose lives are barely worth living?
Alice: No. Adding lives barely worth living uses the extra resources more efficiently than leaving Parfit's second continent empty for all eternity. But, it's not the most efficient way. Not if you believe that creating new people and enhancing the utility of existing people are both important values.
Let's take population "A+" again. Now imagine that instead of having a population of people with lives barely worth living, the second continent is inhabited by a smaller population with the same very high percentage of resources and utility per person as the population of the first continent. Call it "A++. " Would you say "A++" was better than "A+?"
Bob: Sure, definitely.
Alice: How about a world where the two continents exist, but the second one was never inhabited? The people of the first continent then discover the second one and use its resources to improve their level of utility.
Bob: I'm less sure about that one, but I think it might be better than "A+."
Alice: So what Parfit actually proved was: "For every population, A, with a high average level of utility there exists another, better population, B, with more people, access to more resources and a lower average level of utility."
And I can add my own corollary to that: "For every population, B, there exists another, better population, C, that has the same access to resources as B, but a smaller population and higher average utility."
Bob: Okay, I get it. But how does this relate to my cable TV sales pitch?
Alice: Well, my current situation, where I'm spending my money on normal things is analogous to Parfit's population "A." High utility, and very efficient conversion of resources into utility, but not as many resources. We're assuming, of course, that using resources to both create new people and improve the utility of existing people is more morally efficient than doing just one or the other.
The situation where the government gives me $50 to spend on Package A+ is analogous to Parfit's population A+. I have more resources and more utility. But the resources aren't being converted as efficiently as they could be.
The situation where I take the 50% discount and buy Package B is equivalent to Parfit's population B. It's a better situation than A+, but not the most efficient way to use the money.
The situation where I get the $50 from the government to spend on whatever I want is equivalent to my population C. A world with more access to resources than A, but more efficient conversion of resources to utility than A+ or B.
Bob: So what would a world where the government kept the money be analogous to?
Alice: A world where Parfit's second continent was never settled and remained uninhabited for all eternity, its resources never used by anyone.
Bob: I get it. So the Mere Addition Paradox doesn't prove what Parfit thought it did? We don't have any moral obligation to tile the universe with people whose lives are barely worth living?
Alice: Nope, we don't. It's more morally efficient to use a large percentage of our resources to enhance the lives of those who already exist.
Bob: This sure has been a fun conversation. Would you like to buy a cable package from me? We have some great deals.
My argument is that Parfit’s Mere Addition Paradox doesn’t prove what it seems to. The argument behind the Mere Addition Paradox is that you can make the world a better place by the “mere addition” of extra people, even if their lives are barely worth living. In other words : "For every population, A, with a high average level of utility there exists another, better population, B, with more people and a lower average level of utility." This supposedly leads to the Repugnant Conclusion, the belief that a world full of people whose lives are barely worth living is better than a world with a smaller population where the people lead extremely fulfilled and happy lives.
Parfit demonstrates this by moving from world A, consisting of a population full of people with lots of resources and high average utility, and moving to world A+. World A+ has an addition population of people who are isolated from the original population and not even aware of the other’s existence. The extra people live lives just barely worth living. Parfit argues that A+ is a better world than A because everyone in it has lives worth living, and the additional people aren’t hurting anyone by existing because they are isolated from the original population.
Parfit them moves from World A+ to World B, where the populations are merged and share resources. This lowers the standard of living for the original people and raises it for the newer people. Parfit argues that B must be better than A+, because it has higher total utility and equality. He then keeps adding people until he reaches Z, a world where everyones’ lives are barely worth living and the population is vast. He argues that this is a paradox because most people would agree that Z is not a desirable world compared to A.
I argue that the Mere Addition Paradox is a flawed argument because it does not just add people, it also adds resources. The fact that the extra people in A+ do not harm the original people of A by existing indicates that their population must have a decent amount of resources to live on, even if it is not as many per person as the population of A. For this reason what the Mere Addition Paradox proves is not that you can make the world better by adding extra people, but rather that you can make it better by adding extra people and resources to support them. I use a series of choices about purchasing cable television packages to illustrate this in concrete terms.
I further argue for a theory of population ethics that values both using resources to create lives worth living, and using resources to enhance the utility of already existing people, and considers the best sort of world to be one where neither of these two values totally dominate the other. By this ethical standard A+ might be better than A because it has more people and resources, even if the average level of utility is lower. However, a world with the same amount of resources as A+, but a lower population and the same, or higher average utility as A is better than A+.
The main unsatisfying thing about my argument is that while it avoids the Repugnant Conclusion in most cases, it might still lead to it, or something close to it, in situations where creating new people and getting new resources are, as one commenter noted, a “package deal.” In other words, a situation where it is impossible to obtain new resources without creating some new people whose utility levels are below average. However, even in this case, my argument holds that the best world of all is one where it would be possible to obtain the resources without creating new people, or creating a smaller amount of people with higher utility.
In other words, the Mere Addition Paradox does not prove that: "For every population, A, with a high average level of utility there exists another, better population, B, with more people and a lower average level of utility." Instead what the Mere Addition Paradox seems to demonstrate is that: "For every population, A, with a high average level of utility there exists another, better population, B, with more people, access to more resources and a lower average level of utility." Furthermore, my own argument demonstrates that: "For every population, B, there exists another, better population, C, which has the same access to resources as B, but a smaller population and higher average utility."
So, a little background- I've just come out as an atheist to my dad, a Christian pastor, who's convinced he can "fix" my thinking and is bombarding me with a number of flimsy arguments that I'm having trouble articulating a response to, and need help shutting down. The particular issue at the moment deals with non-theistic explanations for human psychology and things like love, morality, and beauty. After attempting to communicate explanations from evolutionary psychology, I was met with amused dismissal of the subject as "speculation".
There's one book in particular he's having me read- The Reason for God by Timothy Keller. In the book, he brings up evolutionary psychology as an alternative to theistic explanations, and immediately dismisses it as apparently self-defeating.
"Evolutionists say that if God makes sense to us, it is not because he is really there, it's only because that belief helped us survive and so we are hardwired for it. However, if we can't trust our belief-forming faculties to tell us the truth about God, why should we trust them to tell us the truth about anything, including evolutionary science? If our cognitive faculties only tell us what we need to survive, not what is true, why trust them about anything at all?" -Timothy Keller
The obvious answer is that knowing the truth about things is generally advantageous to survival- but it hardly addresses the underlying assertion- that without [incredibly specific collection of god-beliefs and assorted dogmas], human brains can't arrive at truth because they weren't designed for it. And of course, I'm talking to a guy with an especially exacting definition of "truth" (100% certainty about the territory)- I could use an LW post that succinctly discusses the role and definition of truth, there.
Another thing Dad likes to do is back me into a corner WRT morality and moral relativism- "Oh, but can you really believe that the act of rape doesn't have an inherent [wrongness]? Are you saying it was justified for [insert historical monster] to do [atrocity] because it would make him reproductively successful?" Armed only with evolutionary explanations for their behavior, I couldn't really respond- possibly my fault, since I haven't read the Morality sequence on account of I got stuck in the Quantum Physics ultrasequence, and knowing that reality is composed of complex amplitudes flowing between explicit configurations or aaasasdjgasjdga whatever the frig even (I CAN'T) has proven to be staggeringly unhelpful in this situation.
In addition to particular arguments WRT the question posed, I could also use recommendations for good, well-argued and accessible books on the subject of evolutionary psychology, with a focus on practical experimental results and application- the guy can't be given a book and not read it, so I'm hoping to at least get him to not dismiss the science as "speculation" or a joke. It's likely he's aware that the field evolutionary psychology is really prone to hindsight bias and thus ignores it completely, so along with the book, a good article or study demonstrating the accuracy and predictive power of the evolutionary psychological model would be appreciated.
Robin Hanson has done a great job of describing the future world and economy, under the assumption that easily copied "uploads" (whole brain emulations), and the standard laws of economics continue to apply. To oversimplify the conclusion:
- There will be great and rapidly increasing wealth. On the other hand, the uploads will be in Darwinian-like competition with each other and with copies, which will drive their wages down to subsistence levels: whatever is required to run their hardware and keep them working, and nothing more.
The competition will not so much be driven by variation, but by selection: uploads with the required characteristics can be copied again and again, undercutting and literally crowding out any uploads wanting higher wages.
Some have focused on the possibly troubling aspects voluntary or semi-voluntary death: some uploads would be willing to make copies of themselves for specific tasks, which would then be deleted or killed at the end of the process. This can pose problems, especially if the copy changes its mind about deletion. But much more troubling is the mass death among uploads that always wanted to live.
What the selection process will favour is agents that want to live (if they didn't, they'd die out) and willing to work for an expectation of subsistence level wages. But now add a little risk to the process: not all jobs pay exactly the expected amount, sometimes they pay slightly higher, sometimes they pay slightly lower. That means that half of all jobs will result in a life-loving upload dying (charging extra to pay for insurance will squeeze that upload out of the market). Iterating the process means that the vast majority of the uploads will end up being killed - if not initially, then at some point later. The picture changes somewhat if you consider "super-organisms" of uploads and their copies, but then the issue simply shifts to wage competition between the super-organisms.
The only way this can be considered acceptable is if the killing of a (potentially unique) agent that doesn't want to die, is exactly compensated by the copying of another already existent agent. I don't find myself in the camp arguing that that would be a morally neutral or positive action.
Pain and unhappiness
I recently learned that a friend of mine, and a long-time atheist (and atheist blogger), is planning to convert to Catholicism. It seems the impetus for her conversion was increasing frustration that she had no good naturalistic account for objective morality in the form of virtue ethics; that upon reflection, she decided she felt like morality "loved" her; that this feeling implied God; and that she had sufficient "if God, then Catholicism" priors to point toward Catholicism, even though she's bisexual (!) and purports to still feel uncertain about the Church's views on sexuality. (Side note: all of this information is material she's blogged about herself, so it's not as if I'm sharing personal details she would prefer to be kept private.)
First, I want to state the rationality lesson I learned from this episode: atheists who spend a great deal of their time analyzing and even critiquing the views of a particular religion are at-risk atheists. Eliezer's spoken about this sort of issue before ("Someone who spends all day thinking about whether the Trinity does or does not exist, rather than Allah or Thor or the Flying Spaghetti Monster, is more than halfway to Christianity."), but I guess it took a personal experience to really drive the point home. When I first read my friend's post, I had a major "I notice that I am confused" moment, because it just seemed so implausible that someone who understood actual atheist arguments (as opposed to dead little sister Hollywood Atheism) could convert to religion, and Catholicism of all things. I seriously considered (and investigated) the possibility that her post was some kind of prank or experiment or otherwise not sincere, or that her account had been hijacked by a very good impersonator (both of these seem quite unlikely at this point).
But then I remembered how I had been frustrated in the past by her tolerance for what seemed like rank religious bigotry and how often I thought she was taking seriously theological positions that seemed about as likely as the 9/11 attacks being genuinely inspired and ordained by Allah. I remembered how I thought she had a confused conception of meta-ethics and that she often seemed skeptical of reductionism, which in retrospect should have been a major red flag for purported atheists. So yeah, spending all your time arguing about Catholic doctrine really is a warning sign, no matter how strongly you seem to champion the "atheist" side of the debate. Seriously.
But second, and more immediately, I wonder if anybody has advice on how to handle this, or if they've had similar experiences with their friends. I do care about this person, and I was devastated to hear this news, so if there's something I can do to help her, I want to. Of course, I would prefer most that she stop worrying about religion entirely and just grok the math that makes religious hypotheses so unlikely as to not be worth your time. But in the short term I'd settle for her not becoming a Catholic, and not immersing herself further in Dark Side Epistemology or surrounding herself with people trying to convince her that she needs to "repent" of her sexuality.
I think I have a pretty good understanding of the theoretical concepts at stake here, but I'm not sure where to start or what style of argument is likely to have the best effect at this point. My tentative plan is to express my concern, try to get more information about what she's thinking, and get a dialogue going (I expect she'll be open to this), but I wanted to see if you all had more specific suggestions, especially if you've been through similar experiences yourself. Thanks!
It’s always good news when someone else develops an idea independently from you. It's a sign you might be onto something. Which is why I was excited to discover that Alan Carter, Professor Emeritus of the University of Glasgow’s Department of Philosophy, has developed the concept of Complexity of Value independent of Less Wrong.
As far as I can tell Less Wrong does not know of Carter, the only references to his existence I could find on LW and OB were written by me. Whether Carter knows of LW or OB is harder to tell, but the only possible link I could find online was that he has criticized the views of Michael Huemer, who knows Bryan Caplan, who knows Robin Hanson. This makes it all the more interesting that Carter has developed views on value and morality very similar to ones commonly espoused on Less Wrong.
The Complexity of Value is one of the more important concepts in Less Wrong. It has been elaborated on its wiki page, as well as some classic posts by Eliezer. Carter has developed the same concept in numerous papers, although he usually refers to it as “a plurality of values” or “multidimensional axiology of value.” I will focus the discussion on working papers Carter has on the University of Glasgow’s website, as they can be linked to directly without having to deal with a pay wall. In particular I will focus on his paper "A Plurality of Values."
Carter begins the paper by arguing:
Wouldn’t it be nice if we were to discover that the physical universe was reducible to only one kind of fundamental entity? ... Wouldn’t it be nice, too, if we were to discover that the moral universe was reducible to only one kind of valuable entity—or one core value, for short? And wouldn’t it be nice if we discovered that all moral injunctions could be derived from one simple principle concerning the one core value, with the simplest and most natural thought being that we should maximize it? There would be an elegance, simplicity and tremendous justificatory power displayed by the normative theory that incorporated the one simple principle. The answers to all moral questions would, in theory at least, be both determinate and determinable. It is hardly surprising, therefore, that many moral philosophers should prefer to identify, and have thus sought, the one simple principle that would, hopefully, ground morality.
And it is hardly surprising that many moral philosophers, in seeking the one simple principle, should have presumed, explicitly or tacitly, that morality must ultimately be grounded upon the maximization of a solitary core value, such as quantity of happiness or equality, say. Now, the assumption—what I shall call the presumption of value-monism—that here is to be identified a single core axiological value that will ultimately ground all of our correct moral decisions has played a critical role in the development of ethical theory, for it clearly affects our responses to certain thought-experiments, and, in particular, our responses concerning how our normative theories should be revised or concerning which ones ought to be rejected.
Most members of this community will immediately recognize the similarities between these paragraphs and Eliezer’s essay “Fake Utility Functions.” The presumption of value monism sounds quite similar to Eliezer’s description of “someone who has discovered the One Great Moral Principle, of which all other values are a mere derivative consequence.” Carter's opinion of such people is quite similar to Eliezer's.
While Eliezer discovered the existence of the Complexity of Value by working on Friendly AI, Carter discovered it by studying some of the thornier problems in ethics, such as the Mere Addition Paradox and what Carter calls the Problem of the Ecstatic Psychopath. Many Less Wrong readers will be familiar with these problems; they have been discussed numerous times in the community.
For those who aren’t, in brief the Mere Addition Paradox states that if one sets maximizing total wellbeing as the standard of value then one is led to what is commonly called the Repugnant Conclusion, the belief that a huge population of people with lives barely worth living is better than a somewhat smaller population of people with extremely worthwhile lives. The Problem of the Ecstatic Psychopath is the inverse of this, it states that, if one takes average levels of well-being as the standard of value, that a population of one immortal ecstatic psychopath with a nonsentient machine to care for all their needs is better than a population of trillions of very happy and satisfied, but not ecstatic people.
Carter describes both of these problems in his paper and draws an insightful conclusion:
In short, surely the most plausible reason for the counter-intuitive nature of any mooted moral requirement to bring about, directly or indirectly, the world of the ecstatic psychopath is that either a large total quantity of happiness or a large number of worthwhile lives is of value; and surely the most plausible reason for the counter-intuitive nature of any mooted injunction to bring about, directly or indirectly, the world of the Repugnant Conclusion is that a high level of average happiness is also of value.
How is it that we fail to notice something so obvious? I submit: because we are inclined to dismiss summarily any value that fails to satisfy our desire for the one core value—in other words, because of the presumption of value-monism.
Once Carter has established the faults of value monism he introduces value pluralism to replace it.1 He introduces two values to start with, “number of worthwhile lives” and “the level of average happiness,” which both contribute to “overall value.” However, their contributions have diminishing returns,2 so a large population with low average happiness and a tiny population with extremely high average happiness are both worse than a moderately sized population with moderately high average happiness.
This is a fairly unique use of the idea of the complexity of value, as far as I know. I’ve read a great deal of Less Wrong’s discussion of the Mere Addition Paradox, and most attempts to resolve it have consisted of either trying to reformulate Average Utilitarianism so that it does not lead to the Problem of the Ecstatic Psychopath, or redefining what "a life barely worth living" means upwards so that it is much less horrible than one would initially think. The idea of agreeing that increasing total wellbeing is important, but not the be all and end all of morality, did not seem to come up, although if it did and I missed it I'd be very happy if someone posted a link to that thread.
Carter’s resolution of the Mere Addition Paradox makes a great deal of sense, as it manages to avoid every single repugnant and counterintuitive conclusion that Total and Average Utilitarianism draw by themselves while still being completely logically consistent. In fact, I think that most people who reject the Repugnant Conclusion will realize that this was their True Rejection all along. I am tempted to say that Carter has discovered Theory X, the hypothetical theory of population ethics Derek Parfit believed could accurately describe the ethics of creating more people without implying any horrifying conclusions.
Carter does not stop there, however, he then moves to the problem of what he calls “pleasure wizards” (many readers may be more familiar with the term “utility monster”). The pleasure wizard can convert resources into utility much more efficiently than a normal person, and hence it can be argued that it deserves more resources. Carter points out that:
…such pleasure-wizards, to put it bluntly, do not exist... But their opposites do. And the opposites of pleasure-wizards—namely, those who are unusually inefficient at converting resources into happiness—suffice to ruin the utilitarian’s egalitarian pretensions. Consider, for example, those who suffer from, what are currently, incurable diseases. … an increase in their happiness would require that a huge proportion of society’s resources be diverted towards finding a cure for their rare condition. Any attempt at a genuine equality of happiness would drag everyone down to the level of these unfortunates. Thus, the total amount of happiness is maximized by diverting resources away from those who are unusually inefficient at converting resources into happiness. In other words, if the goal is, solely, to maximize the total amount of happiness, then giving anything at all to such people and spending anything on cures for their illnesses is a waste of valuable resources. Hence, given the actual existence of such unfortunates, the maximization of happiness requires a considerable inequality in its distribution.
Carter argues that, while most people don’t think all of society’s resources should be diverted to help the very ill, the idea that they should not be helped at all also seems wrong. He also points out that to a true utilitarian the nonexistence of pleasure wizards should be a tragedy:
So, the consistent utilitarian should greatly regret the non-existence of pleasure-wizards; and the utilitarian should do so even when the existence of extreme pleasure-wizards would morally require everyone else to be no more than barely happy.
Yet, this is not how utilitarians behave, he argues, rather:
As I have yet to meet a utilitarian, and certainly not a monistic one, who admits to thinking that the world would be a better place if it contained an extreme pleasure-wizard living alongside a very large population all at that level of happiness where their lives were just barely worth living…But if they do not bemoan the lack of pleasure-wizards, then they must surely value equality directly, even if they hide that fact from themselves. And this suggests that the smile of contentment on the faces of utilitarians after they have deployed diminishing marginal utility in an attempt to show that their normative theory is not incompatible with egalitarianism has more to do with their valuing of equality than they are prepared to admit.
Carter resolves the problem of "pleasure wizard" by suggesting equality as an end in itself as a third contributing value towards overall value. Pleasure wizards should not get all the resources because equality is valuable for its own sake, not just because of diminishing marginal utility. As with average happiness and total worthwhile lives, equality is balanced against other values, rather than dominating them. It may often be ethical for a society to sacrifice some amount of equality to increase the total and average wellbeing.
Carter then briefly states that, though he only discusses three in this paper, there are many other dimensions of value that could be added. It might even be possible to add some form of deontological rules or virtue ethics to the complexity of value, although they would be traded off against consequentialist considerations. He concludes the paper by reiterating that:
Thus, in avoiding the Repugnant Conclusion, the Problem of the Ecstatic Psychopath and the problems posed by pleasure-wizards, as well as the problems posed by any unmitigated demand to level down, we appear to have identified an axiology that is far more consistent with our considered moral judgments than any entailing these counter-intuitive implications.
Carter has numerous other papers discussing the concept in more detail, but “A Plurality of Values” is the most thorough. Other good ones include “How to solve two addition paradoxes and avoid the Repugnant Conclusion,” which more directly engages the Mere Addition Paradox and some of its defenders like Michael Huemer; "Scrooge and the Pleasure Witch," which discusses pleasure wizards and equality in more detail; and “A pre-emptive response to some possible objections to a multidimensional axiology with variable contributory values,” which is exactly what it says on the tin.
On closer inspection it was not hard to see why Carter had developed theories so close to those of Eliezer and other members of Less Wrong and SIAI communities. In many ways their two tasks are similar. Eliezer and the SIAI are trying to devise a theory of general ethics that cannot be twisted into something horrible by a rules-lawyering Unfriendly AI, while Carter is trying to devise a theory of population ethics that cannot be twisted into something horrible by rules-lawyering humans. The worlds of the Repugnant Conclusion and the Ecstatic Psychopath are just the sort of places a poorly programmed AI with artificially simple values would create.
I was very pleased to see an important Less Wrong concept had a defender in mainstream academia. I was also pleased to see that Carter had not just been content to develop the concept of the Complexity of Value. He was also able to employ in the concept in new way, successfully resolving one of the major quandaries of modern philosophy.
2Theodore Sider proposed a theory called "geometrism" in 1991 that also focused on diminishing returns, but geometrism is still a monist theory, it had geometric diminishing returns for the people in the scenario, rather than the values creating the people was trying to fulfill.
Edited - To remove a reference to Aumann's Agreement Theorem that the commenters convinced me was unnecessary and inaccurate.
Just a minor thought connected with the orthogonality thesis: if you claim that any superintelligence will inevitably converge to some true code of morality, then you are also claiming that no measures can be taken by its creators to prevent this convergence. In other words, the superintelligence will be uncontrollable.
In Magical Categories, Eliezer criticizes using machine learning to learn the concept of "smile" from examples. "Smile" sounds simple to humans but is actually a very complex concept. It only seems simple to us because we find it useful.
If we saw pictures of smiling people on the left and other things on the right, we would realize that smiling people go to the left and categorize new things accordingly. A supervised machine learning algorithm, on the other hand, will likely learn something other than what we think of as "smile" (such as "containing things that pass the smiley face recognizer") and categorize molecular smiley faces as smiles.
This is because simplicity is subjective: a human will consider "happy" and "person" to be basic concepts, so the intended definition of smile as "expression of a happy person" is simple. A computational Occam's Razor will consider this correct definition to be a more complex concept than "containing things that pass the smiley face recognizer". I'll use the phrase "magical category" to refer to concepts that have a high Kolmogorov complexity but that people find simple.
I hope that it's possible to create conditions under which the computer will have an inductive bias towards magical categories, as humans do. I think that people find these concepts simple because they're useful to explain things that humans want to explain (such as interactions with people or media depicting people). The video has pixels arranged in this pattern because it depicts a person who is happy because he is eating chocolate.
So, maybe it's possible to learn these magical categories from a lot of data, by compressing the categorizer along with the data. Here's a sketch of a procedure for doing this:
Amass a large collection of data from various societies, containing photographs, text, historical records, etc.
Come up with many categories (say, one for each noun in a long list). For each category, decide which pieces of data fit the category.
Find categorizer_1, categorizer_2, ..., categorizer_n to minimize K(dataset + categorizer_1 + categorizer_2 + ... + categorizer_n)
What do these mean:
- K(x) is the Kolmogorov complexity of x; that is, the length of the shortest (program,input) pair that, when run, produces x. This is uncomputable so it has to be approximated (such as through resource-bounded data compression).
- + denotes string concatenation. There should be some separator so the boundaries between strings are clear.
- dataset is the collection of data
categorizer_k is a program that returns "true" or "false" depending on whether the input fits category #k
When learning a new category, find new_categorizer to minimize K(dataset + categorizer_1 + categorizer_2 + ... + categorizer_n + new_categorizer) while still matching the given examples.
Note that while in this example we learn categorizers, in general it should be possible to learn arbitrary functions including probabilistic functions.
The fact that the categorizers are compressed along with the dataset will create a bias towards categorizers that use concepts useful in compressing the dataset and categorizing other things. From looking at enough data, the concept of "person" naturally arises (in the form of a recognizer/generative model/etc), and it will be used both to compress the dataset and to recognize the "person" category. In effect, because the "person" concept is useful for compressing the dataset, it will be cheap/simple to use in categorizers (such as to recognize real smiling faces).
A useful concept here is "relative complexity" (I don't know the standard name for this), defined as K(x|y) = K(x + y) - K(y). Intuitively this is how complex x is if you already understand y. The categorizer should be trusted in inverse proportion to its relative complexity K(categorizer | dataset and other categorizers); more complex (relative to the data) categorizers are more arbitrary, even given concepts useful for understanding the dataset, and so they're more likely to be wrong on new data.
If we can use this setup to learn "magical" categories, then Friendly AI becomes much easier. CEV requires the magical concepts "person" and "volition" to be plugged in. So do all seriously proposed complete moral systems. I see no way of doing Friendly AI without having some representation of these magical categories, either provided by humans or learned from data. It should be possible to learn deontological concepts such as "obligation" or "right", and also consequentialist concepts such as "volition" or "value". Some of these are 2-place predicates so they're categories over pairs. Then we can ask new questions such as "Do I have a right to do x in y situation?" All of this depends on whether the relevant concepts have low complexity relative to the dataset and other categorizers.
Using this framework for Friendly AI has many problems. I'm hand-waving the part about how to actually compress the data (approximating Kolmogorov complexity). This is a difficult problem but luckily it's not specific to Friendly AI. Another problem is that it's hard to go from categorizing data to actually making decisions. This requires connecting the categorizer to some kind of ontology. The categorization question that we can actually give examples for would be something like "given this description of the situation, is this action good?". Somehow we have to provide examples of (description,action) pairs that are good or not good, and the AI has to come up with a description of the situation before deciding whether the action is good or not. I don't think that using exactly this framework to make Friendly AI is a good idea; my goal here is to argue that sufficiently advanced machine learning can learn magical categories.
If it is in fact possible to learn magical categories, this suggests that machine learning research (especially related to approximations of Solomonoff induction/Kolmogorov complexity) is even more necessary for Friendly AI than it is for unFriendly AI. I think that the main difficulty of Friendly AI as compared with unFriendly AI is the requirement of understanding magical concepts/categories. Other problems (induction, optimization, self-modification, ontology, etc.) are also difficult but luckily they're almost as difficult for paperclip maximizers as they are for Friendly AI.
This has a relationship to the orthogonality thesis. Almost everyone here would agree with a weak form of the orthogonality thesis: that there exist general optimizers AI programs to which you can plug in any goal (such as paperclip maximization). A stronger form of the orthogonality thesis asserts that all ways of making an AI can be easily reduced to specifying its goals and optimization separately; that is, K(AI) ~= K(arbitrary optimizer) + K(goals). My thesis here (that magical categories are simpler relative to data) suggests that the strong form is false. Concepts such as "person" and "value" have important epistemic/instrumental value and can also be used to create goals, so K(Friendly AI) < K(arbitrary optimizer) + K(Friendliness goal). There's really no problem with human values being inherently complex if they're not complex relative to data we can provide to the AI or information it will create on its own for instrumental purposes. Perhaps P(Friendly AI | AGI, passes some Friendliness tests) isn't actually so low even if the program is randomly generated (though I don't actually suggest taking this approach!).
I'm personally working on a programming language for writing and verifying generative models (proving lower bounds on P(data|model)). Perhaps something like this could be used to compress data and categories in order to learn magical categories. If we can robustly learn some magical categories even with current levels of hardware/software, that would be strong evidence for the possibility of creating Friendly AI using this approach, and evidence against the molecular smiley face scenario.
The following is the first draft of my efforts. It's about half as long as the original. It cuts out the section about the Shadowy Figure, which I'm slightly upset about, in particular because it would make the "beyond the reach of God" line stronger. But I felt like if I tried to include it at all, I had to include several paragraphs that took a little too long.
I attempted at first to convert to a "true" poem, (not rhyming, but going for a particular meter). I later decided that too much of it needed to have a conversational quality so it's more of a short play than a poem. Lines are broken up in a particular way to suggest timing and make it easier to read out loud.
I wanted a) to share the results with people on the chance that someone else might want to perform a little six minute dialog (my test run clocked in at 6:42), and b) get feedback on how I chose to abridge things. Do you think there were important sections that can be tied in without making it too long? Do you think some sections that I reworded could be reworded better, or that I missed some?
Edit: I've addressed most of the concerns people had. I think I'm happy with it, at least for my purposes. If people are still concerned by the ending I'll revise it, but I think I've set it up better now.
The Gift We Give Tomorrow
How, oh how could the universe,
itself unloving, and mindless,
cough up creatures capable of love?
No mystery in that.
It's just a matter
of natural selection.
But natural selection is cruel. Bloody.
And bloody stupid!
Even when organisms aren't directly tearing at each other's throats…
…there's a deeper competition, going on between the genes.
A species could evolve to extinction,
if the winning genes were playing negative sum games
How could a process,
Cruel as Azathoth,
Create minds that were capable of love?
Mystery is a property of questions.
A mother's child shares her genes,
And so a mother loves her child.
But mothers can adopt their children.
And still, come to love them.
Still no mystery.
Evolutionary psychology isn't about deliberately maximizing fitness.
Through most of human history,
we didn't know genes existed.
Well, fine. But still:
Humans form friendships,
even with non-relatives.
How can that be?
Ancient hunter-gatherers would often play the Iterated Prisoner's Dilemma.There could be profit in betrayal.
But the best solution:
was reciprocal altruism.
the most dangerous human is not the strongest,
or even the smartest:
But the one who has the most allies.
But not all friends are fair-weather friends;
there are true friends -
those who would sacrifice their lives for another.
Shouldn't that kind of devotion
remove itself from the gene pool?
You said it yourself:
We have a concept of true friendship and fair-weather friendship.
We wouldn't be true friends with someone who we didn't think was a true friend to us.
And one with many true friends?
They are far more formidable
than one with mere fair-weather allies.
And Mohandas Gandhi,
who really did turn the other cheek?
Those who try to serve all humanity,
whether or not all humanity serves them in turn?\
That’s a more complex story.
Humans aren’t just social animals. We’re political animals.
Sometimes the formidable human is not the strongest,
but the one who skillfully argues that their preferred policies
match the preferences of others.
How does that explain Gandhi?
The point is that we can argue about 'What should be done?'
We can make those arguments and respond to them.
Without that, politics couldn't take place.
Okay... but Gandhi?
Believed certain complicated propositions about 'What should be done?'
Then did them.
That sounds suspiciously like it could explain any possible human behavior.
If we traced back the chain of causality,
through all the arguments...
We'd find a moral architecture.
The ability to argue abstract propositions.
A preference for simple ideas.
An appeal to hardwired intuitions about fairness.
A concept of duty. Aversion to pain.
Filtered by memetic selection,all of this resulted in a concept:
"You should not hurt people,"
In full generality.
And that gets you Gandhi.
What else would you suggest?
Some godlike figure?
Reaching out from behind the scenes,
Hell no. But -
Because then I’d would have to ask :
How did that god originally decide that love was even desirable.
How it got preferences that included things like friendship, loyalty, and fairness.
Call it 'surprising' all you like.
But through evolutionary psychology,
You can see how parental love, romance, honor,even true altruism and moral arguments,
all bear the specific design signature of natural selection.
If there were some benevolent god, reaching out to create a world of loving humans,
it too must have evolved,
defeating the point of postulating it at all.
I'm not postulating a god!
I'm just asking how human beings ended up so nice.
Nice?Have you looked at this planet lately? We bear all those other emotions that evolved as well.
Which should make it very clear that we evolved, should you begin to doubt it.
Humans aren't always nice.
But, still, come on...
doesn't it seem a little...
That nothing but millions of years of a cosmic death tournament…
could cough up mothers and fathers,
sisters and brothers,
husbands and wives,
true altruists and guardians of causes,
police officers and loyal defenders,
even artists, sacrificing themselves for their art?
All practicing so many kinds of love?
For so many things other than genes?
Doing their part to make their world less ugly,
something besides a sea of blood and violence and mindless replication?
Are you honestly surprised by this?
If so, question your underlying model.
For it's led you to be surprised by the true state of affairs.
Since the very beginning,
not one unusual thing has ever happened.
But how are you NOT amazed?
Maybe there’s no surprise from a causal viewpoint.
But still, it seems to me, in the creation of humans by evolution,
something happened that is precious and marvelous and wonderful.
If we can’t call it a physical miracle, then call it a moral miracle.
Because it was only a miracle from the perspective of the morality that was produced?
Explaining away all the apparent coincidence,
from a causal and physical perspective?
Well... yeah. I suppose you could interpret it that way.
I just meant that something was immensely surprising and wonderful on a moral level,
even if it's not really surprising,
on a physical level.
I think that's what I said.
It just seems to me that in your view, somehow you explain that wonder away.
I explain it.
Of course there's a story behind love.
Behind all ordered events, one finds ordered stories.And that which has no story is nothing but random noise.
Hardly any better.
If you can't take joy in things with true stories behind them,
your life will be empty.
Love has to begin somehow.
It has to enter the universe somewhere.
It’s like asking how life itself begins.Though you were born of your father and mother,
and though they arose from their living parents in turn,
if you go far and far and far away back,
you’ll finally come to a replicator that arose by pure accident.
The border between life and unlife.
So too with love.
A complex pattern must be explained by a cause
that’s not already that complex pattern. For love to enter the universe,
it has to arise from something that is not love.If that weren’t possible, then love could not be.
Just as life itself required that first replicator,
to come about by accident,
but still caused:
far, far back in the causal chain that led to you:
3.8 billion years ago,
in some little tidal pool.
Perhaps your children's children will ask,how it is that they are capable of love.
And their parents will say:
Because we, who also love, created you to love.
And your children's children may ask: But how is it that you love?
And their parents will reply:
Because our own parents,
who loved as well,
created us to love in turn.
And then your children's children will ask:
But where did it all begin?
Where does the recursion end?
And their parents will say:
Once upon a time, long ago and far away,
there were intelligent beings who were not themselves intelligently designed.
Once upon a time, there were lovers,
created by something that did not love.
Once upon a time,
when all of civilization was a single galaxy,
A single star.
A single planet.
A place called Earth.
Ever So Long Ago.
For those not familiar with the topic, Torture vs. Dustspecks asks the question: "Would you prefer that one person be horribly tortured for fifty years without hope or rest, or that 3^^^3 people get dust specks in their eyes?"
Most of the discussion that I have noted on the topic takes one of two assumptions in deriving their answer to that question: I think of one as the 'linear additive' answer, which says that torture is the proper choice for the utilitarian consequentialist, because a single person can only suffer so much over a fifty year window, as compared to the incomprehensible number of individuals who suffer only minutely; the other I think of as the 'logarithmically additive' answer, which inverts the answer on the grounds that forms of suffering are not equal, and cannot be added as simple 'units'.
What I have never yet seen is something akin to the notion expressed in Ursula K LeGuin's The Ones Who Walk Away From Omelas.If you haven't read it, I won't spoil it for you.
I believe that any metric of consequence which takes into account only suffering when making the choice of "torture" vs. "dust specks" misses the point. There are consequences to such a choice that extend beyond the suffering inflicted; moral responsibility, standards of behavior that either choice makes acceptable, and so on. Any solution to the question which ignores these elements in making its decision might be useful in revealing one's views about the nature of cumulative suffering, but beyond that are of no value in making practical decisions -- they cannot be, as 'consequence' extends beyond the mere instantiation of a given choice -- the exact pain inflicted by either scenario -- into the kind of society that such a choice would result in.
While I myself tend towards the 'logarithmic' than the 'linear' additive view of suffering, even if I stipulate the linear additive view, I still cannot agree with the conclusion of torture over the dust speck, for the same reason why I do not condone torture even in the "ticking time bomb" scenario: I cannot accept the culture/society that would permit such a torture to exist. To arbitrarily select out one individual for maximal suffering in order to spare others a negligible amount would require a legal or moral framework that accepted such choices, and this violates the principle of individual self-determination -- a principle I have seen Less Wrong's community spend a great deal of time trying to consider how to incorporate into Friendliness solutions for AGI. We as a society already implement something similar to this, economically: we accept taxing everyone, even according to a graduated scheme. What we do not accept is enslaving 20% of the population to provide for the needs of the State.
If there is a flaw in my reasoning here, please enlighten me.
Human values seem to be at least partly selfish. While it would probably be a bad idea to build AIs that are selfish, ideas from AI design can perhaps shed some light on the nature of selfishness, which we need to understand if we are to understand human values. (How does selfishness work in a decision theoretic sense? Do humans actually have selfish values?) Current theory suggest 3 possible ways to design a selfish agent:
- have a perception-determined utility function (like AIXI)
- have a static (unchanging) world-determined utility function (like UDT) with a sufficiently detailed description of the agent embedded in the specification of its utility function at the time of the agent's creation
- have a world-determined utility function that changes ("learns") as the agent makes observations (for concreteness, let's assume a variant of UDT where you start out caring about everyone, and each time you make an observation, your utility function changes to no longer care about anyone who hasn't made that same observation)
Note that 1 and 3 are not reflectively consistent (they both refuse to pay the Counterfactual Mugger), and 2 is not applicable to humans (since we are not born with detailed descriptions of ourselves embedded in our brains). Still, it seems plausible that humans do have selfish values, either because we are type 1 or type 3 agents, or because we were type 1 or type 3 agents at some time in the past, but have since self-modified into type 2 agents.
But things aren't quite that simple. According to our current theories, an AI would judge its decision theory using that decision theory itself, and self-modify if it was found wanting under its own judgement. But humans do not actually work that way. Instead, we judge ourselves using something mysterious called "normativity" or "philosophy". For example, a type 3 AI would just decide that its current values can be maximized by changing into a type 2 agent with a static copy of those values, but a human could perhaps think that changing values in response to observations is a mistake, and they ought to fix that mistake by rewinding their values back to before they were changed. Note that if you rewind your values all the way back to before you made the first observation, you're no longer selfish.
So, should we freeze our selfish values, or rewind our values, or maybe even keep our "irrational" decision theory (which could perhaps be justified by saying that we intrinsically value having a decision theory that isn't too alien)? I don't know what conclusions to draw from this line of thought, except that on close inspection, selfishness may offer just as many difficult philosophical problems as altruism.
Let it be noted, as an aside, that this is my first post on Less Wrong and my first attempt at original, non-mandatory writing for over a year.
I've been reading through the original sequences over the last few months as part of an attempt to get my mind into working order. (Other parts of this attempt include participating in Intro to AI and keeping a notebook.) The realization that spurred me to attempt this: I don't feel that living is good. The distinction which seemed terribly important to me at the time was that I didn't feel that death was bad, which is clearly not sensible. I don't have the resources to feel the pain of one death 155,000 times every day, which is why Torture v. Dust Specks is a nonsensical question to me and why I don't have a cached response for how to act on the knowledge of all those deaths.
The first time I read Torture v. Dust Specks, I started really thinking about why I bother trying to be rational. What's the point, if I still have to make nonsensical, kitschy statements like "Well, my brain thinks X but my heart feels Y," if I would not reflexively flip the switch and may even choose not to, and if I sometimes feel that a viable solution to overpopulation is more deaths?
I solved the lattermost with extraterrestrial settlement, but it's still, well, sketchy. My mind is clearly full of some pretty creepy thoughts, and rationality doesn't seem to be helping. I think about having that feeling and go eeugh, but the feelings are still there. So I pose the question: what does a person do to click that death is really, really bad?
The primary arguments I've heard for death are:
- "I look forward to the experience of shutting down and fading away," which I hope could be easily disillusioned by gaining knowledge about how truly undignified dying is, bloody romanticists.
- "There is something better after life and I'm excited for it," which, well... let me rephrase: please do not turn this into a discussion on ways to disillusion theists because it's really been talked about before.
- "It is Against Nature/God's Will/The Force to live forever. Nature/God/the Force is going to get humankind if we try for immortality. I like my liver!" This argument is so closely related to the previous and the next one that I don't know quite how to respond to it, other than that I've seen it crop up in historical accounts of any big change. Human beings tend to be really frightened of change, especially change which isn't believed to be supernatural in origin.
- "I've read science fiction stories about being immortal, and in those stories immortality gets really boring, really fast. I'm not interested enough in reality to be in it forever." I can't see where this perspective could come from other than mind-numbing ignorance/the unimaginable nature of really big things (like the number of languages on Earth, the amount of things we still don't know about physics or the fact that every person who is or ever will be is a new, interesting being to interact with.)
- "I can't imagine being immortal. My idea about how my life will go is that I will watch my children grow old, but I will die before they do. My mind/human minds aren't meant to exist for longer than one generation." This fails to account for human minds being very, very flexible. The human mind as we know it now does eventually get tired of life (or at least tired of pain,) but this is not a testament to how minds are, any more than humans becoming distressed when they don't eat is a testament to it being natural to starve, become despondent and die.
- "The world is overpopulated and if nobody dies, we will overrun and ultimately ruin the planet." First of all: I, like Dr. Ian Malcolm, think that it is incredibly vain to believe that man can destroy the Earth. Second of all: in the future we may have anything from extraterrestrial habitation to substrates which take up space and consume material in totally different ways. But! Clearly, I am not feeling these arguments, because this argument makes sense to me. Problematic!
I think that overall, the fear most people have about signing up for cryonics/AI/living forever is that they do not understand it. This is probably true for me; it's probably why I don't grok that life is good, always. Moreover, it is probable that the depictions of death as not always bad with which I sympathize (e.g. 'Lord, what can the harvest hope for, if not for the care of the Reaper Man?) stem from the previously held to be absolute nature of death. That is, up until the last ~30 years, people have not been having cogent, non-hypothetical thoughts about how it might be possible to not die or what that might be like. Dying has always been a Big Bad but an inescapable one, and the human race has a bad case of Stockholm Syndrome.
So: now that I know I have and what I want, how do I use the former to get the latter?
A trolley (i.e. in British English a tram) is running out of control down a track. In its path are five people who have been tied to the track by a mad philosopher. Fortunately, you could flip a switch, which will lead the trolley down a different track to safety. Unfortunately, there is a single person tied to that track. Should you flip the switch or do nothing?
Participants with one kind of serotonin transmitter (LL-homozygotes) judged flipping the switch to be better than a morally neutral action. Participants with the other kind (S-carriers) judged flipping the switch to be no better than a morally neutral action. The groups responded equally to the "fat man scenario" both rejecting the 'push' option.
We hypothesized that 5-HTTLPR genotype would interact with intentionality in respondents who generated moral judgments. Whereas we predicted that all participants would eschew intentionally harming an innocent for utilitarian gains, we predicted that participants' judgments of foreseen but unintentional harm would diverge as a function of genotype. Specifically, we predicted that LL homozygotes would adhere to the principle of double effect and preferentially select the utilitarian option to save more lives despite unintentional harm to an innocent victim, whereas S-allele carriers would be less likely to endorse even unintentional harm. Results of behavioral testing confirmed this hypothesis.
Participants in this study judged the acceptability of actions that would unintentionally or intentionally harm an innocent victim in order to save others' lives. An analysis of variance revealed a genotype × scenario interaction, F(2, 63) = 4.52, p = .02. Results showed that, relative to long allele homozygotes (LL), carriers of the short (S) allele showed particular reluctance to endorse utilitarian actions resulting in foreseen harm to an innocent individual. LL genotype participants rated perpetrating unintentional harm as more acceptable (M = 4.98, SEM = 0.20) than did SL genotype participants (M = 4.65, SEM = 0.20) or SS genotype participants (M = 4.29, SEM = 0.30).
The results indicate that inherited variants in a genetic polymorphism that influences serotonin neurotransmission influence utilitarian moral judgments as well. This finding is interpreted in light of evidence that the S allele is associated with elevated emotional responsiveness.
The great moral philosopher Jeremy Bentham, founder of utilitarianism, famously said,'The question is not, "Can they reason?" nor, "Can they talk?" but rather, "Can they suffer?" Most people get the point, but they treat human pain as especially worrying because they vaguely think it sort of obvious that a species' ability to suffer must be positively correlated with its intellectual capacity.
Nevertheless, most of us seem to assume, without question, that the capacity to feel pain is positively correlated with mental dexterity - with the ability to reason, think, reflect and so on. My purpose here is to question that assumption. I see no reason at all why there should be a positive correlation. Pain feels primal, like the ability to see colour or hear sounds. It feels like the sort of sensation you don't need intellect to experience. Feelings carry no weight in science but, at the very least, shouldn't we give the animals the benefit of the doubt?
I can see a Darwinian reason why there might even be be a negative correlation between intellect and susceptibility to pain. I approach this by asking what, in the Darwinian sense, pain is for. It is a warning not to repeat actions that tend to cause bodily harm. Don't stub your toe again, don't tease a snake or sit on a hornet, don't pick up embers however prettily they glow, be careful not to bite your tongue. Plants have no nervous system capable of learning not to repeat damaging actions, which is why we cut live lettuces without compunction.
It is an interesting question, incidentally, why pain has to be so damned painful. Why not equip the brain with the equivalent of a little red flag, painlessly raised to warn, "Don't do that again"?
[...] my primary question for today: would you expect a positive or a negative correlation between mental ability and ability to feel pain? Most people unthinkingly assume a positive correlation, but why?
Isn't it plausible that a clever species such as our own might need less pain, precisely because we are capable of intelligently working out what is good for us, and what damaging events we should avoid? Isn't it plausible that an unintelligent species might need a massive wallop of pain, to drive home a lesson that we can learn with less powerful inducement?
At very least, I conclude that we have no general reason to think that non-human animals feel pain less acutely than we do, and we should in any case give them the benefit of the doubt. Practices such as branding cattle, castration without anaesthetic, and bullfighting should be treated as morally equivalent to doing the same thing to human beings.
Imagine a being so vast and powerful that its theory of mind of other entities would itself be a sentient entity. If this entity came across human beings, it might model those people at a level of resolution that every imagination it has of them would itself be conscious.
Just like we do not grant rights to our thoughts, or the bacteria that make up a big part of our body, such an entity might be unable to grant existential rights to its thought processes. Even if they are of an extent that when coming across a human being the mere perception of it would incorporate a human-level simulation.
But even for us humans it might not be possible to account for every being in our ethical conduct. It might not work to grant everything the rights that it does deserve. Nevertheless, the answer can not be to abandon morality altogether. If only for the reason that human nature won't permit this. It is part of our preferences to be compassionate.
Our task must be to free ourselves . . . by widening our circle of compassion to embrace all living creatures and the whole of nature and its beauty.
— Albert Einstein
How do we solve this dilemma? Right now it's relatively easy to handle. There are humans and then there is everything else. But even today — without uplifted animals, artificial intelligence, human-level simulations, cyborgs, chimeras and posthuman beings — it is increasingly hard to draw the line. For that science is advancing rapidly, allowing us to keep alive people with severe brain injury or save a premature fetus whose mother is already dead. Then there are the mentally disabled and other humans who are not neurotypical. We are also increasingly becoming aware that many non-human beings on this planet are far more intelligent and cognizant than expected.
And remember, as will be the case in future, it has already been the case in our not too distant past. There was a time when three different human species lived at the same time on the same planet. Three intelligent species of the homo genus, yet very different. Only 22,000 years ago we, H. sapiens, have been sharing this oasis of life with Homo floresiensis and Homo neanderthalensis.
How would we handle such a situation at the present-day? At a time when we still haven't learnt to live together in peace. At a time when we are still killing even our own genus. Most of us are not even ready to become vegetarian in the face of global warming, although livestock farming amounts to 18% of the planet’s greenhouse gas emissions.
So where do we draw the line?
The Moral Psychology Handbook (2010), edited by John Doris, is probably the best way to become familiar with the exciting interdisciplinary field of moral psychology. The chapters are written by philosophers, psychologists, and neuroscientists. A few of them are all three, and the university department to which they are assigned is largely arbitrary.
I should also note that the chapter authors happen to comprise a large chunk of my own 'moral philosophers who don't totally suck' list. The book is also exciting because it undermines or outright falsifies a long list of popular philosophical theories with - gasp! - empirical evidence.
Chapter 1: Evolution of Morality (Machery & Mallon)
The authors examine three interpretations of the claim that morality evolved. The claims "Some components of moral psychology evolved" and "Normative cognition is a product of evolution" are empirically well-supported but philosophically uninteresting. The stronger claim that "Moral cognition (a kind of normative cognition) evolved" is more philosophically interesting, but at present not strongly supported by the evidence (according to the authors).
The chapter serves as a compact survey of recent models for the evolution of morality in humans (Joyce, Hauser, de Waal, etc.), and attempts to draw philosophical conclusions about morality from these descriptive models (e.g. Joyce, Street).
Chapter 2: Multi-system Moral Psychology (Cushman, Young, & Greene)
The authors survey the psychological and neuroscientific evidence showing that moral judgments are both intuitive/affective/unconscious and rational/cognitive/conscious, and propose a dual-process theory of moral judgment. Scientific data is used to verify or falsify philosophical theories proposed as, for example, explanations for trolley-problem cases.
Consequentialist moral judgments are more associated with rational thought than deontological judgment, but both deontological and consequentialist moral judgments have their sources in emotion. Deontological judgments are associated with 'alarm bell' emotions that circumvent reasoning and provide absolute demands on behavior. Alarm bell emotions are rooted in (for example) the amygdala. Consequentialist judgments are associated with 'currency' emotions provide negotiable motivations that weigh for and against particular behaviors, and are rooted in meso-limbic regions that track a stimulus' reward magnitude, reward probability, and expected value.
This chapter might be the best one in the book.
Chapter 3: Moral Motivation (Schroeder, Roskies, & Nichols)
The authors categorize philosophical theories of moral motivation into four groups:
- Instrumentalists think people are motivated when they form beliefs about how to satisfy pre-existing desires.
- Cognitivists think people are motivated merely by the belief that something is right or wrong.
- Sentimentalists think people are morally motivated only by emotions.
- Personalists think people are motivated by their character: their knowledge of good and bad, their wanting for good or bad, their emotions about good or bad, and their habits of responding to these three.
The authors then argue that the neuroscience of motivation fits best with the instrumentalist and personalist pictures of moral motivation, poses some problems for sentimentalists, and presents grave problems for cognitivists. The main weakness of the chapter is that its picture of the neuroscience of motivation is mostly drawn from a decade-old neuroscience textbook. As such, the chapter misses many new developments, especially the important discoveries occurring in neuroeconomics. Still, I can personally attest that the latest neuroscience still comes down most strongly in favor of instrumentalists and personalists, but there are recent details that could have been included in this chapter.
Chapter 4: Moral Emotions (Prinz & Nichols)
The authors survey studies that illuminate the role of emotions in moral cognition, and discuss several models that have been proposed, concluding that the evidence currently respects each of them. They then focus on a more detailed discussion of two emotions that are particularly causal in the moral judgments of Western society: anger and guilt.
The chapter is strong in example experiments, but a higher-level discussion of the role of emotions in moral judgment is provided by chapter 2.
Chapter 5: Altruism (Stich, Doris, & Roedder)
The authors distinguish four kinds of desires: (1) desires for pleasure and avoiding pain, (2) self-interested desires, (3) desires that are not self-interested and no for the well-being of others, and (4) desires for the well-being of others. Psychological hedonism maintains that all (terminal, as opposed to instrumental) desires are of type 1. Psychological egoism says that all desires are of type 2 (which includes type 1). Altruism claims that some desires fall into category 4. And if there are desires of tyep 3 but none of type 4, then both egoism and altruism are false.
The authors survey evolutionary arguments for and against altruism, but are not yet convinced by any of them.
Psychology, however, does support the existence of altruism, which seems to be "the product of an emotional response to another's distress." The authors survey the experimental evidence, especially the work of Batson. They conclude there is significant support for the existence of genuine human altruism. We are not motivated by selfishness alone.
Chapter 6: Moral Reasoning (Harman, Mason, & Sinnott-Armstrong)
The authors clarify the roles of conscious and unconscious moral reasoning, and reject one popular theory of moral reasoning: the deductive model. One of many reasons for their rejection of the deductive model is that it assumes we come to explicit moral conclusions by applying logic, probability theory, and decision theory to pre-existing moral principles, but in the deductive model these principles are understood in terms of psychological theories of concepts that are probably false. The authors survey the 'classical view of concepts' (concepts as defined in terms of necessary and sufficient conditions) and conclude that it is less likely to be true than alternate theories of mental concepts that are less friendly to the deductive model of moral reasoning.
The authors propose an alternate model of moral reasoning whereby one makes mutual adjustments to one's beliefs and plans and values in pursuit of what Rawls called 'reflective equilibrium.'
Chapter 7: Moral Intuitions (Sinnott-Armstrong, Young, & Cushman)
The authors refer to moral intuitions as "strong, stable, immediate moral beliefs." The 'immediate' part means that these moral beliefs do not arise through conscious reasoning; the subject is conscious only of the resulting moral belief.
Their project is this:
...moral intuitions are unreliable to the extent that morally irrelevant factors affect moral intuitions. When they are distorted by irrelevant factors, moral intuitions can be likened to mirages or seeing pink elephants while one is on LSD. Only when beliefs arise in more reputable ways do they have a fighting chance of being justified. Hence we need to know about the processes that produce moral intuitions before we can determine whether moral intuitions are justified.
Thus the chapter engages in something like Less Wrong-style 'dissolution to algorithm.'
A major weakness of this article is that it focuses on the understanding of intuitions as attribute substitution heuristics, but ignores the other two major sources of intuitive judgments: evolutionary psychology and unconscious associative learning.
Chapter 8: Linguistics and Moral Theory (Roedder & Harman)
This chapter examines the 'linguistic analogy' in moral psychology - the analogy between Chomsky's 'universal grammar' and what has been called 'universal moral grammar.' The authors don't have any strong conclusions, but instead suggest that this linguistic analogy may be a helpful framework for pursuing further research. They list five ways in particular the analogy is useful. This chapter can be skipped without missing much.
Chapter 9: Rules (Mallon & Nichols)
The authors survey the evidence that moral rules "are mentally represented and play a causal role in the production of judgment and behavior." This may be obvious, but it's nice to have the evidence collected somewhere.
Chapter 10: Responsibility (Knobe & Doris)
This chapter surveys the experimental studies that test people's attributions of moral responsibility. In short, people do not make such judgments according to invariant principles, as assumed by most of 20th century moral philosophy. (Moral philosophers have spent most of their time trying to find a set of principles that accounted for people's ordinary moral judgments, and showing that alternate sets of principles failed to capture people's ordinary moral judgments in particular circumstances.)
People adopt different moral criteria for judging different cases, even when they verbally endorse a simple set of abstract principles. This should not be surprising, as the same had already been shown to be true in linguistics and in non-moral judgment. The chapter surveys the variety of ways in which people adopt different moral criteria for different cases.
Chapter 11: Character (Merritt, Doris, & Harman)
This chapter surveys the evidence from situationist psychology, which undermines the 'robust character traits' view of human psychology upon which many varieties of virtue ethics depend.
Chapter 12: Well-Being (Tiberius & Plakias)
This chapter surveys competing concepts of 'well-being' in psychology, and provides reasons for using the 'life satisfaction' concept of well-being, especially in philosophy. The authors then discuss life satisfaction and normativity; for example the worry about the arbitrariness of factors that lead to human life satisfaction.
Chapter 13: Race and Racial Cognition (Kelly, Machery, & Mallon)
I didn't read this chapter.
If our morality is complex and directly tied to what's human—if we're seeking to avoid building paperclip maximizers—how do you judge and quantify the danger in training yourself to become more rational if it should drift from being more human?
My friend is a skeptical theist. She, for instance, scoffs mightily at Camping's little dilemma/psychosis but then argues from a position of comfort that Rapture it's a silly thing to predict because it's clearly stated that no one will know the day. And then she gives me a confused look because the psychological dissonance is clear.
On one hand, my friend is in a prime position to take forward steps to self-examination and holding rational belief systems. On the other hand, she's an opera singer whose passion and profession require her to be able to empathize with and explore highly irrational human experiences. Since rationality is the art of winning, nobody can deny that the option that lets you have your cake and eat it too is best, but how do you navigate such a narrows?
In another example, a recent comment thread suggested the dangers of embracing human tendencies: catharsis might lead to promoting further emotional intensity. At the same time, catharsis is a well appreciated human communication strategy with roots in Greek stage. If rational action pulls you away from humanity, away from our complex morality, then how do we judge it worth doing?
The most immediate resolution to this conundrum appears to me to be that human morality has no consistency constraint: we can want to be powerful and able to win while also want to retain our human tendencies which directly impinge on that goal. Is there a theory of metamorality which allows you to infer how such tradeoffs should be managed? Or is human morality, as a program, flawed with inconsistencies that lead to inescapable cognitive dissonance and dehumanization? If you interpret morality as a self-supporting strange loop, is it possible to have unresolvable, drifting interpretations based on how you focus you attentions?
Dual to the problem of resolving a way forward is the problem of the interpreter. If there is a goal to at least marginally increase the rationality of humanity, but in order to discover the means to do so you have to become less capable of empathizing with and communicating with humanity, who acts as an interpreter between the two divergent mindsets?
I just found 120 Euro (about $172) on the floor in the hallway in a hostel in Berlin. What should I do, and why?
- It's not inconceivable that the hostel might just take the money if I turn it in.
- I'll be at this hostel for about two more days.
Evolution. Morality. Strategy. Security/Cryptography. This hits so many topics of interest, I can't imagine it not being discussed here. Bruce Schneier blogs about his book-in-progress, The Dishonest Minority:
Humans evolved along this path. The basic mechanism can be modeled simply. It is in our collective group interest for everyone to cooperate. It is in any given individual's short-term self interest not to cooperate: to defect, in game theory terms. But if everyone defects, society falls apart. To ensure widespread cooperation and minimal defection, we collectively implement a variety of societal security systems.
I am somewhat reminded of Robin Hanson's Homo Hypocritus writings from the above, although it is not the same. Schneier says that the book is basically a first draft at this point, and might still change quite a bit. Some of the comments focus on whether "dishonest" is actually the best term to use for defecting from social norms.
This post is a bit of shameless self-promotion, but also a pointer to an example of Yudkowskian philosophy at work that LWers may enjoy, this time concerning philosophical theories of desire.
Episode 14 of my podcast with Alonzo Fyfe, Morality in the Real World, begins to dissolve some common philosophical debates about the nature of desire by replacing the symbol with the substance, etc. Transcript and links here, mp3 here. The episode can also probably serve as a big hint of where I'm going with my metaethics sequence.
Warning: Alonzo and I are not voice actors, and my sound engineering cannot compare to that of Radiolab.
I'm planning a top-level post (probably two or three or more) on when agent utility should not be part of utilitarian calculations - which seems to be an interesting and controversial topic given some recent posts. I'm looking for additional ideas, and particularly counterarguments. Also hunting for article titles. The series would look something like the following - noting that obviously this summary does not have much room for nuance or background argument. I'm assuming moral antirealism, with the selection of utilitarianism as an implemented moral system.
Intro - Utilitarianism has serious, fundamental measurement problems, and sometimes substantially contradicts our intuitions. One solution is to say our intuitions are wrong - this isn't quite right (i.e. a morality can't be "wrong") unless our intuitions are internally inconsistent, which I do not think is the problem. This is particularly problematic because agents (especially with high self modification capacities) may face socially undesirable incentives. I argue that a better solution is to ignore or discount the utility of certain agents in certain circumstances. This better fits general moral intuitions. (There remains a debate as to whether Morality A might be better than Morality B when Morality B better matches our general intuitions - I don't want to get into this, as I'm not sure there's a non-circular meaning of "better" as applied to morality that does not relate to moral intuitions.)
1 -First, expressly anti-utilitarian utility can be disregarded. Most of the cases of this are fairly simple and bright-line. No matter how much Bob enjoys raping people, the utility he derives from doing so is irrelevant unless he drinks the utilitarian Koolaid and only, for example, engages in rape fantasies (in which case his utility is counted - the issue is not that his desire is bad, it's that his actions are). This gets into some slight line-drawing problems with, for example, utility derived from competition (as one may delight in defeating people - this probably survives, however, particularly since it is all consensual).
1.5 - The above point is also related to the issue of discounting the future utility of such persons; I'm trying to figure out if it belongs in this sequence. The example I plan to use (which makes pretty much the entire point) is as follows. You have some chocolate ice cream you have to give away. You can give it to a small child and a person who has just brutally beaten and molested that child. The child kinda likes chocolate ice cream; vanilla is his favorite flavor, but chocolate's OK. The adult absolutely, totally loves chocolate ice cream; it's his favorite food in the world. I, personally, give the kid the ice cream, and I think so does well over 90% of the general population. On the other hand, if the adult were simply someone who had an interest in molesting children, but scrupulously never acted on it, I would not discount his utility so cheerfully. This may simply belong as a separate post on its own on the utility value of punishment. I'd be interested in feedback on it.
2 -Finally, and trickiest, is the problem of utility conditioned on false beliefs. Take two examples: an african village stoning a child to death because they think she's a witch who has made it stop raining, and the same village curing that witch-hood by ritually dunking her in holy water (or by some other innocuous procedure). In the former case, there's massive disutility that occurs because people will think it will solve a problem that it won't (I'm also a little unclear on what it would mean for the utility of the many to "outweigh" the utility of the one, but that's an issue I'll address in the intro article). In the latter, there's minimal disutility (maybe even positive utility), even though there's the same impotence. The best answer seems to be that utility conditioned on false beliefs should be ignored to the extent that it is conditioned on false beliefs. Many people (myself included) celebrate religious holidays with no belief whatsoever in the underlying religion - there is substantial value in the gathering of family and community. Similarly, there is some value to the gathering of the community in both village cases; in the murder it doesn't outweigh the costs, in the baptism it very well might.
3 - (tentative) How this approach coincides with the unweighted approach in the long term. Basically, if we ignore certain kinds of utility, we will encourage agents to pursue other kinds of utility (if you can't burn witches to improve your harvest, perhaps you'll learn how to rotate crops better). The utility they pursue is likely to be of only somewhat lower value to them (or higher value in some cases, if they're imperfect, i.e. human). However, it will be of non-negative value to others. Thus, a policy-maker employing adjusted utilitarianism is likely to obtain better outcomes from an unweighted perspective. I'm not sure this point is correct or cogent.
I'm aware at least some of this is against lesswrong canon. I'm curious as to if people have counterarguments, objections, counterexamples, or general feedback on whether this would be a desirable series to spell out.
Eliezer mentions two challenges he often gets, "Friendly to who?" and "Oh, so you get to say what 'Friendly' means." At the moment I see only one true answer to these questions, which I give below. If you can propose alternatives in the comments, please do.
I suspect morality is in practice a multiplayer game, so talking about it needs multiple people to be involved. Therefore, let's imagine a dialogue between A and B.
A: Okay, so you're interested in Friendly AI. Who will it be Friendly toward?
B: Obviously the people who participate in making the system will decide how to program it, so they will decide who it is Friendly toward.
A: So the people who make the system decide what "Friendly" means?
A: Then they could decide that it will be Friendly only toward them, or toward White people. Aren't that sort of selfishness or racism immoral?
B: I can try to answer questions about the world, so if you can define morality so I can do experiments to discover what is moral and what is immoral, I can try to guess the results of those experiments and report them. What do you mean by morality?
A: I don't know. If it doesn't mean anything, why do people talk about morality so much?
B: People often profess beliefs to label themselves as members of a group. So far as I can tell, the belief that some things are moral and other things are not is one of those beliefs. I don't have any other explanation for why people talk so much about something that isn't subject to experimentation.
A: So if that's what morality is, then it's fundamentally meaningless unless I'm planning out what lies to tell in order to get positive regard from a potential ingroup, or better yet I manage to somehow deceive myself so I can truthfully conform to the consensus morality of my desired ingroup. If that's all it is, there's no constraint on how a Friendly AI works, right? Maybe you'll build it and it will be only be Friendly toward B.
B: No, because I can't do it by myself. Suppose I approach you and say "I'm going to make a Friendly AI that lets me control it and doesn't care about anyone else's preference." Would you help me?
A: Obviously not.
B: Nobody else would either, so the only way I can unilaterally run the world with an FAI is to create it by myself, and I'm not up to that. There are a few other proposed notions of Friendlyness that are nonviable for similar reasons. For example, if I approached you and said "I'm going to make a Friendly AI that treats everyone fairly, but I don't want to let anybody inspect how it works." Would you help me?
A: No, because I wouldn't trust you. I'd assume that you plan to really make it Friendly only toward yourself, lie about it, and then drop the lie once the FAI had enough power that you didn't need the lie any more.
B: Right. Here's an ethical system that fails another way: "I'll make an FAI that cares about every human equally, no matter what they do." To keep it simple, let's assume that engineering humans to have strange desires for the purpose of manipulating the FAI is not possible. Would you help me build that?
A: Well, it fits with my intuitive notion of morality, but it's not clear what incentive I have to help. If you succeed, I seem to win equally at the end whether I help you or not. Why bother?
B: Right. There are several possible fixes for that. Perhaps if I don't get your help, I won't succeed, and the alternative is that someone else builds it poorly and your quality of life decreases dramatically. That gives you an incentive to help.
A: Not much of one. You'll surely need a lot of help, and maybe if all those other people help I won't have to. Everyone would make the same decision and nobody would help.
B: Right. I could solve that problem by paying helpers like you money, if I had enough money. Another option would be to tilt the Friendlyness in the direction of helpers in proportion to how much they help me.
A: But isn't tilting the Friendlyness unfair?
B: Depends. Do you want things to be fair?
A: Yes, for some intuitive notion of "fairness" I can't easily describe.
B: So if the AI cares what you want, that will cause it to figure out what you mean by "fair" and tend to make it happen, with that tendency increasing as it tilts more in your favor, right?
A: I suppose so. No matter what I want, if the AI cares enough about me, it will give me more of what I want, including fairness.
B: Yes, that's the best idea I have right now. Here's another alternative: What would happen if we only took action when there's a consensus about how to weight the fairness?
A: Well, 4% of the population are sociopaths. They, and perhaps others, would make ridiculous demands and prevent any consensus. Then we'd be waiting forever to build this thing and someone else who doesn't care about consensus would move while we're dithering and make us irrelevant. Thus we'll have to take action and do something reasonable without having a consensus about what that is. Since we can't wait for a consensus, maybe it makes sense to proceed now. So how about it? Do you need help yet?
B: Nope, I don't know how to make it.
A: Damn. Hmm, do you think you'll figure it out before everybody else?
B: Probably not. There are a lot of everybody else. In particular, business organizations that optimize for profit have a lot of power and have fundamentally inhuman value systems. I don't see how I can take action before all of them.
A: Me either. We are so screwed.
View more: Next