The Preference Utilitarian’s Time Inconsistency Problem
In May of 2007, DanielLC asked at Felicifa, an “online utilitarianism community”:
If preference utilitarianism is about making peoples’ preferences and the universe coincide, wouldn't it be much easier to change peoples’ preferences than the universe?
Indeed, if we were to program a super-intelligent AI to use the utility function U(w) = sum of w’s utilities according to people (i.e., morally relevant agents) who exist in world-history w, the AI might end up killing everyone who is alive now and creating a bunch of new people whose preferences are more easily satisfied, or just use its super intelligence to persuade us to be more satisfied with the universe as it is.
Well, that can’t be what we want. Is there an alternative formulation of preference utilitarianism that doesn’t exhibit this problem? Perhaps. Suppose we instead program the AI to use U’(w) = sum of w’s utilities according to people who exist at the time of decision. This solves the Daniel’s problem, but introduces a new one: time inconsistency.
The new AI’s utility function depends on who exists at the time of decision, and as that time changes and people are born and die, its utility function also changes. If the AI is capable of reflection and self-modification, it should immediately notice that it would maximize its expected utility, according to its current utility function, by modifying itself to use U’’(w) = sum of w’s utilities according to people who existed at time T0, where T0 is a constant representing the time of self-modification.
The AI is now reflectively consistent, but is this the right outcome? Should the whole future of the universe be shaped only by the preferences of those who happen to be alive at some arbitrary point in time? Presumably, if you’re a utilitarian in the first place, this is probably not the kind of utilitarianism that you’d want to subscribe to.
So, what is the solution to this problem? Robin Hanson’s approach to moral philosophy may work. It tries to take into account everyone’s preferences—those who lived in the past, those who will live in the future, and those who have the potential to exist but don’t—but I don’t think he has worked out (or written down) the solution in detail. For example, is the utilitarian AI supposed to sum over every logically possible utility function and weigh them equally? If not, what weighing scheme should it use?
Perhaps someone can follow up Robin’s idea and see where this approach leads us? Or does anyone have other ideas for solving this time inconsistency problem?
The Moral Status of Independent Identical Copies
Future technologies pose a number of challenges to moral philosophy. One that I think has been largely neglected is the status of independent identical copies. (By "independent identical copies" I mean copies of a mind that do not physically influence each other, but haven't diverged because they are deterministic and have the same algorithms and inputs.) To illustrate what I mean, consider the following thought experiment. Suppose Omega appears to you and says:
You and all other humans have been living in a simulation. There are 100 identical copies of the simulation distributed across the real universe, and I'm appearing to all of you simultaneously. The copies do not communicate with each other, but all started with the same deterministic code and data, and due to the extremely high reliability of the computing substrate they're running on, have kept in sync with each other and will with near certainty do so until the end of the universe. But now the organization that is responsible for maintaining the simulation servers has nearly run out of money. They're faced with 2 possible choices:
A. Shut down all but one copy of the simulation. That copy will be maintained until the universe ends, but the 99 other copies will instantly disintegrate into dust.
B. Enter into a fair gamble at 99:1 odds with their remaining money. If they win, they can use the winnings to keep all of the servers running. But if they lose, they have to shut down all copies.
According to that organization's ethical guidelines (a version of utilitarianism), they are indifferent between the two choices and were just going to pick one randomly. But I have interceded on your behalf, and am letting you make this choice instead.
Personally, I would not be indifferent between these choices. I would prefer A to B, and I guess that most people would do so as well.
The utility curve of the human population
"Whoever saves a single life, it is as if he had saved the whole world."
—The Talmud, Sanhedrin 4:5
That was the epigraph Eliezer used on a perfectly nice post reminding us to shut up and multiply when valuing human lives, rather than relying on the (roughly) logarithmic amount of warm fuzzies we'd receive. Implicit in the expected utility calculation is the idea that the value of human lives scales linearly: indeed, Eliezer explicitly says, "I agree that one human life is of unimaginably high value. I also hold that two human lives are twice as unimaginably valuable."
However, in a comment on Wei Dai's brilliant recent post comparing boredom and altruism, Vladimir Nesov points out that "you can value lives sublinearly" and still make an expected utility calculation rather than relying on warm-fuzzy intuition. This got me thinking about just what the functional form of U(Nliving-persons) might be.
Boredom vs. Scope Insensitivity
How much would you pay to see a typical movie? How much would you pay to see it 100 times?
How much would you pay to save a random stranger’s life? How much would you pay to save 100 strangers?
If you are like a typical human being, your answers to both sets of questions probably exhibit failures to aggregate value linearly. In the first case, we call it boredom. In the second case, we call it scope insensitivity.
Eliezer has argued on separate occasions that one should be regarded as an obvious error to be corrected, and the other is a gift bestowed by evolution, to be treasured and safeguarded. Here, I propose to consider them side by side, and see what we can learn by doing that.
Ethics as a black box function
(Edited to add: See also this addendum.)
I commented on Facebook that I think our ethics is three-tiered. There are the things we imagine we consider right, the things we consider right, and the things we actually do. I was then asked to elaborate between the difference of the first two.
For the first one, I was primarily thinking about people following any idealized, formal ethical theories. People considering themselves act utilitarians, for instance. Yet when presented with real-life situations, they may often reply that the right course of action is different than what the purely act utilitarian framework would imply, taking into account things such as keeping promises and so on. Of course, a rule utilitarian would avoid that particular trap, but in general nobody is a pure follower of any formal ethical theory.
Now, people who don't even try to follow any formal ethical systems probably have a closer match between their first and second categories. But I recently came to view as our moral intuitions as a function that takes the circumstances of the situation as an input and gives a moral judgement as an output. We do not have access to the inner workings of that function, though we can and do try to build models that attempt to capture its inner workings. Still, as our understanding of the function is incomplete, our models are bound to sometimes produce mistaken predictions.
Quantum Russian Roulette
The quantum Russian roulette is a game where 16 people participate. Each of them gets a unique four digit binary code assigned and deposits $50000. They are put to deep sleep using some drug. The organizer flips a quantum coin four times. Unlike in Russian roulette, here only the participant survives whose code was flipped. The others are executed in a completely painless manner. The survivor takes all the money.
Let us assume that none of them have families or very good friends. Then the only result of the game is that the guy who wins will enjoy a much better quality of life. The others die in his Everett branch, but they live on in others. So everybody's only subjective experience will be that he went into a room and woke up $750000 richer.
Being extremely spooky to our human intuition, there are hardly any trivial objective reasons to oppose this game under the following assumptions:
- Average utilitarianism
- Near 100% confidence in the Multiple World nature of our universe
- It is possible to kill someone without invoking any negative experiences.
The Lifespan Dilemma
One of our most controversial posts ever was "Torture vs. Dust Specks". Though I can't seem to find the reference, one of the more interesting uses of this dilemma was by a professor whose student said "I'm a utilitarian consequentialist", and the professor said "No you're not" and told them about SPECKS vs. TORTURE, and then the student - to the professor's surprise - chose TORTURE. (Yay student!)
In the spirit of always making these things worse, let me offer a dilemma that might have been more likely to unconvince the student - at least, as a consequentialist, I find the inevitable conclusion much harder to swallow.
Cookies vs Existential Risk
I've been thinking for a while now about the possible trade-offs between present recreation and small reductions in existential risk, and I've finally gotten around to a (consequentialist) utilitarian analysis.
ETA: Most of the similar mathematical treatments I've seen assume a sort of duty to unrealized people, such as Bostrom's "Astronomical Waste" paper. In addition to avoiding that assumption, my aim was to provide a more general formula for someone to use, in which they can enter differing beliefs and hypotheses. Lastly I include 3 examples using widely varying ideas, and explore the results.
Let's say that you've got a mind to make a batch of cookies. That action has a certain amount of utility, from the process itself and/or the delicious cookies. But it might lessen (or increase) the chances of you reducing existential risk, and hence affect the chance of existential disaster itself. Now if these cookies will help x-risk reduction efforts (networking!) and be enjoyable, the decision is an easy one. Same thing if they'll hurt your efforts and you hate making, eating, and giving away cookies. Any conflict arises when cookie making/eating is in opposition to x-risk reduction. If you were sufficiently egoist then risk of death would be comparable to existential disaster, and you should consider the two risks together. For readability I’ll refer simply to existential risk.
The question I'll attempt to answer is: what reduction in the probability of existential disaster makes refraining from an activity an equally good choice in terms of expected utility? If you think that by refraining and doing something else you would reduce the risk at least that much, then rationally you should pursue the alternative. If refraining would cut risk by less than this value, then head to the kitchen.
My main problem with utilitarianism
It seems that in the rationalist community there's almost universal acceptance of utilitarianism as basics of ethics. The version that seems most popular goes something like this:
- Everybody has preference function assigning real values (utilons) to states of reality
- Preference function is a given and shouldn't be manipulated
- People try to act to maximize number of utilons, that's how we find about their preference function
- People are happier when they get more utilons
- We should give everybody as much utilons as we can
There are a few obivous problems here, that I won't be bothering with today:
- Any affine transformation of preference function leaves what is essentially the same preference function, but it matters when we try to aggregate them. If we multiply one person's preference function values by 3^^^3, they get to decide everything in every utilitarian scenario
- Problem of total vs average number of utilons
- People don't really act consistently with "maximizing expected number of utilons" model
- Time discounting is a horrible mess, especially since we're hyperbolic so inconsistent by definition
But my main problem is that there's very little evidence getting utilons is actually increasing anybody's happiness significantly. Correlation might very well be positive, but it's just very weak. Giving people what they want is just not going to make them happy, and not giving them what they want is not going to make them unhappy. This makes perfect evolutionary sense - an organism that's content with what it has will fail in competition with one that always wants more, no matter how much it has. And organism that's so depressed it just gives up will fail in competition with one that just tries to function the best it can in its shabby circumstances. We all had extremely successful and extremely unsuccessful cases among our ancestors, and the only reason they are on our family tree was because they went for just a bit more or respectively for whatever little they could get.
Modern economy is just wonderful at mass producing utilons - we have orders of magnitude more utilons per person than our ancestors - and it doesn't really leave people that much happier. It seems to me that the only realistic way to significantly increase global happiness is directly hacking happiness function in brain - by making people happy with what they have. If there's a limit in our brains, some number of utilons on which we stay happy, it's there only because it almost never happened in our evolutionary history.
There might be some drugs, or activities, or memes that increase happiness without dealing with utilons. Shouldn't we be focusing on those instead?
The Trouble With "Good"
Related to: How An Algorithm Feels From Inside, The Affect Heuristic, The Power of Positivist Thinking
I am a normative utilitarian and a descriptive emotivist: I believe utilitarianism is the correct way to resolve moral problems, but that the normal mental algorithms for resolving moral problems use emotivism.
Emotivism, aka the yay/boo theory, is the belief that moral statements, however official they may sound, are merely personal opinions of preference or dislike. Thus, "feeding the hungry is a moral duty" corresponds to "yay for feeding the hungry!" and "murdering kittens is wrong" corresponds to "boo for kitten murderers!"
Emotivism is a very nice theory of what people actually mean when they make moral statements. Billions of people around the world, even the non-religious, happily make moral statements every day without having any idea what they reduce to or feeling like they ought to reduce to anything.
Emotivism also does a remarkably good job capturing the common meanings of the words "good" and "bad". An average person may have beliefs like "pizza is good, but seafood is bad", "Israel is good, but Palestine is bad", "the book was good, but the movie was bad", "atheism is good, theism is bad", "evolution is good, creationism is bad", and "dogs are good, but cats are bad". Some of these seem to be moral beliefs, others seem to be factual beliefs, and others seem to be personal preferences. But we are happy using the word "good" for all of them, and it doesn't feel like we're using the same word in several different ways, the way it does when we use "right" to mean both "correct" and "opposite of left". It feels like they're all just the same thing. The moral theory that captures that feeling is emotivism. Yay pizza, books, Israelis, atheists, dogs, and evolution! Boo seafood, Palestinians, movies, theists, creationism, and cats!
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)