Secrets of the eliminati

Scott Alexander

Anyone who does not believe mental states are ontologically fundamental - ie anyone who denies the reality of something like a soul - has two choices about where to go next. They can try reducing mental states to smaller components, or they can stop talking about them entirely.

In a utility-maximizing AI, mental states can be reduced to smaller components. The AI will have goals, and those goals, upon closer examination, will be lines in a computer program.

But in the blue-minimizing robot, its "goal" isn't even a line in its program. There's nothing that looks remotely like a goal in its programming, and goals appear only when you make rough generalizations from its behavior in limited cases.

Philosophers are still very much arguing about whether this applies to humans; the two schools call themselves reductionists and eliminativists (with a third school of wishy-washy half-and-half people calling themselves revisionists). Reductionists want to reduce things like goals and preferences to the appropriate neurons in the brain; eliminativists want to prove that humans, like the blue-minimizing robot, don't have anything of the sort until you start looking at high level abstractions.

I took a similar tack asking ksvanhorn's question in yesterday's post - how can you get a more accurate picture of what your true preferences are? I said:

I don't think there are true preferences. In one situation you have one tendency, in another situation you have another tendency, and "preference" is what it looks like when you try to categorize tendencies. But categorization is a passive and not an active process: if every day of the week I eat dinner at 6, I can generalize to say "I prefer to eat dinner at 6", but it would be non-explanatory to say that a preference toward dinner at 6 caused my behavior on each day. I think the best way to salvage preferences is to consider them as tendencies currently in reflective equilibrium.

A more practical example: when people discuss cryonics or anti-aging, the following argument usually comes up in one form or another: if you were in a burning building, you would try pretty hard to get out. Therefore, you must strongly dislike death and want to avoid it. But if you strongly dislike death and want to avoid it, you must be lying when you say you accept death as a natural part of life and think it's crass and selfish to try to cheat the Reaper. And therefore your reluctance to sign up for cryonics violates your own revealed preferences! You must just be trying to signal conformity or something.

The problem is that not signing up for cryonics is also a "revealed preference". "You wouldn't sign up for cryonics, which means you don't really fear death so much, so why bother running from a burning building?" is an equally good argument, although no one except maybe Marcus Aurelius would take it seriously.

Both these arguments assume that somewhere, deep down, there's a utility function with a single term for "death" in it, and all decisions just call upon this particular level of death or anti-death preference.

More explanatory of the way people actually behave is that there's no unified preference for or against death, but rather a set of behaviors. Being in a burning building activates fleeing behavior; contemplating death from old age does not activate cryonics-buying behavior. People guess at their opinions about death by analyzing these behaviors, usually with a bit of signalling thrown in. If they desire consistency - and most people do - maybe they'll change some of their other behaviors to conform to their hypothesized opinion.

One more example. I've previously brought up the case of a rationalist who knows there's no such thing as ghosts, but is still uncomfortable in a haunted house. So does he believe in ghosts or not? If you insist on there being a variable somewhere in his head marked $belief_in_ghosts = (0,1) then it's going to be pretty mysterious when that variable looks like zero when he's talking to the Skeptics Association, and one when he's running away from a creaky staircase at midnight.

But it's not at all mysterious that the thought "I don't believe in ghosts" gets reinforced because it makes him feel intelligent and modern, and staying around a creaky staircase at midnight gets punished because it makes him afraid.

Behaviorism was one of the first and most successful eliminationist theories. I've so far ignored the most modern and exciting eliminationist theory, connectionism, because it involves a lot of math and is very hard to process on an intuitive level. In the next post, I want to try to explain the very basics of connectionism, why it's so exciting, and why it helps justify discussion of behaviorist principles.

I took a similar tack asking ksvanhorn's question in yesterday's post - how can you get a more accurate picture of what your true preferences are? I said:

I don't think there are true preferences. In one situation you have one tendency, in another situation you have another tendency, and "preference" is what it looks like when you try to categorize tendencies. But categorization is a passive and not an active process: if every day of the week I eat dinner at 6, I can generalize to say "I prefer to eat dinner at 6", but it would be non-explanatory to say that a preference toward dinner at 6 caused my behavior on each day. I think the best way to salvage preferences is to consider them as tendencies currently in reflective equilibrium.

If everyone's inferred utility goes from 0 to 1, and the real-life utility monster cares more than the other people about one thing, the inferred utility will say he cares less than other people about something else. Let him play that game until the something else happens, then he loses, and that's a fine outcome.

That's not the situation I'm describing; if 0 is "you and all your friends and relatives getting tortured to death" and 1 is "getting everything you want," the utility monster is someone who puts "not getting one thing I want" at, say, .1 whereas normal people put it at .9999.

You have failed to disagree with me. My proposal exactly fits your alleged counterexample.

Suppose Alice is a utility monster where:

U(Alice, torture of everybody) = 0
U(Alice, everything) = 1
U(Alice, no cookie) = 0.1
U(Alice, Alice dies) = 0.05

And Bob is normal, except he doesn't like Alice:

U(Bob, torture of everybody) = 0
U(Bob, everything) = 1
U(Bob, Alice lives, no cookie) = 0.8
U(Bob, Alice dies, no cookie) = 0.9

If the FAI has a cookie it can give to Bob or Alice, it will give it to Alice, since U(cookie to Bob) = U(Bob, everything) + U(Alice, everything but a cookie) = 1 + 0.1 = 1.1 < U(cookie to Alice) = U(Bob, everything but a cookie) + U(Alice, everything) = 0.8 + 1 = 1.8. Thus Alice gets her intended reward for being a utility monster.

However, if the are no cookies available and the FAI can kill Alice, it will do so for the benefit of Bob, since U(Bob, Alice lives, no cookie) + U(Alice, Alice lives, no cookie) = 0.8 + 0.1 = 0.9 < U(Bob, Alice dies, no cookie) + U(Alice, Alice dies) = 0.9 + 0.05 = 0.95. The basic problem is that since Alice had the cookie fixation, that ate up so much of her utility range that her desire to live in the absence of the cookie was outweighed by Bob finding her irritating.

Another problem with Alice's utility is that it supports the FAI doing lotteries that Alice would apparently prefer but a normal person would not. For example, assuming the outcome for Bob does not change, the FAI should prefer 50% Alice dies + 50% Alice gets a cookie (adds to 0.525) over 100% Alice lives without a cookie (which is 0.1). This is a different issue from interpersonal utility comparison.

How do you add two utilities together?

They are numbers. Add them.

And if humans turn out to be adaption-executers, then utility is going to look really weird, because it'll depend a lot on framing and behavior.

Yes. So far as I can tell, if the FAI is going to do what people want, it has to model people as though they want something, and that means ascribing utility functions to them. Better alternatives are welcome. Giving up because it's a hard problem is not welcome.

If people dislike losses more than they like gains and status is zero-sum, does that mean the reasonable result of average utilitarianism when applied to status is that everyone must be exactly the same status?

No. If Alice has high status and Bob has low status, and the FAI takes action to lower Alice's status and raise Bob's, and people hate losing, then Alice's utility decrease will exceed Bob's utility increase, so the FAI will prefer to leave the status as it is. Similarly, the FAI isn't going to want to increase Alice's status at the expense of Bob. The FAI just won't get involved in the status battles.

I have not found this conversation rewarding. Unless there's an obvious improvement in the quality of your arguments, I'll drop out.

Edit: Fixed the math on the FAI-kills-Alice scenario. Vaniver continued to change the topic with every turn, so I won't be continuing the conversation.

How do you add two utilities together?

They are numbers. Add them.

So are the atmospheric pressure in my room and the price of silver. But you cannot add them together (unless you have a conversion factor from millibars to dollars per ounce).

1Vaniver15y

What if wants did not exist a priori, but only in response to stimuli? Alice, for example, doesn't care about cookies, she cares about getting her way. If the FAI tells Alice and Bob "look, I have a cookie; how shall I divide it between you?" Alice decides that the cookie is hers and she will throw the biggest tantrum if the FAI decides otherwise, whereas Bob just grumbles to himself. If the FAI tells Alice and Bob individually "look, I'm going to make a cookie just for you, what would you like in it?" both of them enjoy the sugar, the autonomy of choosing, and the feel of specialness, without realizing that they're only eating half of the cookie dough. Suppose Alice is just as happy in both situations, because she got her way in both situations, and that Bob is happier in the second situation, because he gets more cookie. In such a scenario, the FAI would never ask Alice and Bob to come up with a plan to split resources between the two of them, because Alice would turn it into a win/lose situation. It seems to me that an FAI would engage in want curation rather than want satisfaction. As the saying goes, seek to want what you have, rather than seeking to have what you want. A FAI who engages in that behavior would be more interested in a stimuli-response model of human behavior and mental states than a consequentialist-utility model of human behavior and mental states. [...] This is one of the reasons why utility monsters tend to seem self-destructive; they gamble farther and harder than most people would. [...] How do we measure one person's utility? Preferences revealed by actions? (That is, given a mapping from situations to actions to consequences, I can construct a utility function which takes situations and consequences as inputs and returns the decision taken.) If so, when we add two utilities together, does the resulting number still uniquely identify the actions taken by both parties?

137

Secrets of the eliminati

137

137

137

Secrets of the eliminati

137

137