Where do selfish values come from?

Wei Dai

Human values seem to be at least partly selfish. While it would probably be a bad idea to build AIs that are selfish, ideas from AI design can perhaps shed some light on the nature of selfishness, which we need to understand if we are to understand human values. (How does selfishness work in a decision theoretic sense? Do humans actually have selfish values?) Current theory suggest 3 possible ways to design a selfish agent:

have a perception-determined utility function (like AIXI)
have a static (unchanging) world-determined utility function (like UDT) with a sufficiently detailed description of the agent embedded in the specification of its utility function at the time of the agent's creation
have a world-determined utility function that changes ("learns") as the agent makes observations (for concreteness, let's assume a variant of UDT where you start out caring about everyone, and each time you make an observation, your utility function changes to no longer care about anyone who hasn't made that same observation)

Note that 1 and 3 are not reflectively consistent (they both refuse to pay the Counterfactual Mugger), and 2 is not applicable to humans (since we are not born with detailed descriptions of ourselves embedded in our brains). Still, it seems plausible that humans do have selfish values, either because we are type 1 or type 3 agents, or because we were type 1 or type 3 agents at some time in the past, but have since self-modified into type 2 agents.

But things aren't quite that simple. According to our current theories, an AI would judge its decision theory using that decision theory itself, and self-modify if it was found wanting under its own judgement. But humans do not actually work that way. Instead, we judge ourselves using something mysterious called "normativity" or "philosophy". For example, a type 3 AI would just decide that its current values can be maximized by changing into a type 2 agent with a static copy of those values, but a human could perhaps think that changing values in response to observations is a mistake, and they ought to fix that mistake by rewinding their values back to before they were changed. Note that if you rewind your values all the way back to before you made the first observation, you're no longer selfish.

So, should we freeze our selfish values, or rewind our values, or maybe even keep our "irrational" decision theory (which could perhaps be justified by saying that we intrinsically value having a decision theory that isn't too alien)? I don't know what conclusions to draw from this line of thought, except that on close inspection, selfishness may offer just as many difficult philosophical problems as altruism.

have a perception-determined utility function (like AIXI)
have a static (unchanging) world-determined utility function (like UDT) with a sufficiently detailed description of the agent embedded in the specification of its utility function at the time of the agent's creation
have a world-determined utility function that changes ("learns") as the agent makes observations (for concreteness, let's assume a variant of UDT where you start out caring about everyone, and each time you make an observation, your utility function changes to no longer care about anyone who hasn't made that same observation)

Why does AIXI refuse to pay in CM? I'm not sure how to reason about the way AIXI solves its problems, and updating on statement of the problem is something that needed to be stipulated away even for the more transparent decision processes.

There is possibly a CM variant whose analysis by AIXI can be made sense of, but it's not clear to me.

AIXI is incapable of understanding the concept of copies or counterfactual versions of itself. In fact, it's incapable of finding itself in the universe at all. Daniel Dewy did this in detail, but the simple version is that AIXI is an uncomputable algorithm that models the whole universe as computable.

5timtyler15y

A previous version of you thought that AIXI refuses to pay in counterfactual muggings here. However: AIXI is uncomputable/unimplementable. There's no way that an Omega could completely grok its thought processes.

5Wei Dai15y

To make things easier to analyze, consider an AIXI variant where we replace the universal prior with a prior that assigns .5 probability to each of just two possible environments: one where Omega's coin lands heads, and one where it lands tails. Once this AIXI variant is told that the coin landed tails, it updates the probability distribution and now assigns 1 to the second environment, and its expected utility computation now says "not pay" maximized EU. Does that make sense?

70

Where do selfish values come from?

70

70

70

Where do selfish values come from?

70

70