Where do selfish values come from?

Wei Dai

70 Where do selfish values come from?

18th Nov 2011

2 min read

70

Human values seem to be at least partly selfish. While it would probably be a bad idea to build AIs that are selfish, ideas from AI design can perhaps shed some light on the nature of selfishness, which we need to understand if we are to understand human values. (How does selfishness work in a decision theoretic sense? Do humans actually have selfish values?) Current theory suggest 3 possible ways to design a selfish agent:

have a perception-determined utility function (like AIXI)
have a static (unchanging) world-determined utility function (like UDT) with a sufficiently detailed description of the agent embedded in the specification of its utility function at the time of the agent's creation
have a world-determined utility function that changes ("learns") as the agent makes observations (for concreteness, let's assume a variant of UDT where you start out caring about everyone, and each time you make an observation, your utility function changes to no longer care about anyone who hasn't made that same observation)

Note that 1 and 3 are not reflectively consistent (they both refuse to pay the Counterfactual Mugger), and 2 is not applicable to humans (since we are not born with detailed descriptions of ourselves embedded in our brains). Still, it seems plausible that humans do have selfish values, either because we are type 1 or type 3 agents, or because we were type 1 or type 3 agents at some time in the past, but have since self-modified into type 2 agents.

But things aren't quite that simple. According to our current theories, an AI would judge its decision theory using that decision theory itself, and self-modify if it was found wanting under its own judgement. But humans do not actually work that way. Instead, we judge ourselves using something mysterious called "normativity" or "philosophy". For example, a type 3 AI would just decide that its current values can be maximized by changing into a type 2 agent with a static copy of those values, but a human could perhaps think that changing values in response to observations is a mistake, and they ought to fix that mistake by rewinding their values back to before they were changed. Note that if you rewind your values all the way back to before you made the first observation, you're no longer selfish.

So, should we freeze our selfish values, or rewind our values, or maybe even keep our "irrational" decision theory (which could perhaps be justified by saying that we intrinsically value having a decision theory that isn't too alien)? I don't know what conclusions to draw from this line of thought, except that on close inspection, selfishness may offer just as many difficult philosophical problems as altruism.

Decision theoryUpdateless Decision TheoryValue Learning

Personal Blog

70

New Comment

Rendering 0/62 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 11:06 PM

Moderation Log

70 Where do selfish values come from?

by Wei Dai

18th Nov 2011

2 min read

70

have a perception-determined utility function (like AIXI)
have a static (unchanging) world-determined utility function (like UDT) with a sufficiently detailed description of the agent embedded in the specification of its utility function at the time of the agent's creation
have a world-determined utility function that changes ("learns") as the agent makes observations (for concreteness, let's assume a variant of UDT where you start out caring about everyone, and each time you make an observation, your utility function changes to no longer care about anyone who hasn't made that same observation)

Decision theoryUpdateless Decision TheoryValue Learning

Personal Blog

70

Mentioned in

232UDT shows that decision theory is more puzzling than ever

102Ontological Crisis in Humans

77Under-acknowledged Value Differences

54UDT can learn anthropic probabilities

39"Solving" selfishness for UDT

New Comment

Rendering 0/62 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 11:06 PM

Moderation Log

More from Wei Dai

Curated and popular this week

62Comments

Comment Permalink

Wei Dai15y110

It seems that in this post, by "selfish" you mean something like "not updateless" or "not caring about counterfactuals".

By "selfish" I mean how each human (apparently) cares about himself more than others, which needs an explanation because there can't be a description of himself embedded in his brain at birth. "Not updateless" is meant to be a proposed explanation, not a definition of "selfish".

A meaning closer to usual sense of the word would be, "caring about welfare of a particular individual" (including counterfactual instances of that individual, etc.), which seems perfectly amenable to being packaged as a reflectively consistent agent (that is not the individual in question) with world-determined utility function.

No, that's not the meaning I had in mind.

(A reference to usage in Stuart's paper maybe? I didn't follow it.)

This post isn't related to his paper, except that it made me think about selfishness and how it relates to AIXI and UDT.

Showing 3 of 5 replies (Click to show all)

torekp14y20

I think there's enough science on the subject - here's the first paper I could find with a quick Google - to sketch out an approximate answer to the question of how self-care arises in an individual life. The infant first needs to form the concept of a person (what Bischof calls self-objectification), loosely speaking a being with both a body and a mind. This concept can be applied to both self and others. Then, depending on its level of emotional contagion (likelihood of feeling similarly to others when observing their emotions) it will learn, through... (read more)

1snarles14y

Just as altruism can be related to trust, selfishness can be related to distrust. An agent which has a high prior belief in the existence of deceptive adversaries would exhibit "selfish" behaviors.

2[anonymous]15y

Why not, or what do you mean by this? Common sense suggests that we do know ourselves from others at a very low, instinctive level.

See in context