Human values seem to be at least partly selfish. While it would probably be a bad idea to build AIs that are selfish, ideas from AI design can perhaps shed some light on the nature of selfishness, which we need to understand if we are to understand human values. (How does selfishness work in a decision theoretic sense? Do humans actually have selfish values?) Current theory suggest 3 possible ways to design a selfish agent:
- have a perception-determined utility function (like AIXI)
- have a static (unchanging) world-determined utility function (like UDT) with a sufficiently detailed description of the agent embedded in the specification of its utility function at the time of the agent's creation
- have a world-determined utility function that changes ("learns") as the agent makes observations (for concreteness, let's assume a variant of UDT where you start out caring about everyone, and each time you make an observation, your utility function changes to no longer care about anyone who hasn't made that same observation)
Note that 1 and 3 are not reflectively consistent (they both refuse to pay the Counterfactual Mugger), and 2 is not applicable to humans (since we are not born with detailed descriptions of ourselves embedded in our brains). Still, it seems plausible that humans do have selfish values, either because we are type 1 or type 3 agents, or because we were type 1 or type 3 agents at some time in the past, but have since self-modified into type 2 agents.
But things aren't quite that simple. According to our current theories, an AI would judge its decision theory using that decision theory itself, and self-modify if it was found wanting under its own judgement. But humans do not actually work that way. Instead, we judge ourselves using something mysterious called "normativity" or "philosophy". For example, a type 3 AI would just decide that its current values can be maximized by changing into a type 2 agent with a static copy of those values, but a human could perhaps think that changing values in response to observations is a mistake, and they ought to fix that mistake by rewinding their values back to before they were changed. Note that if you rewind your values all the way back to before you made the first observation, you're no longer selfish.
So, should we freeze our selfish values, or rewind our values, or maybe even keep our "irrational" decision theory (which could perhaps be justified by saying that we intrinsically value having a decision theory that isn't too alien)? I don't know what conclusions to draw from this line of thought, except that on close inspection, selfishness may offer just as many difficult philosophical problems as altruism.
It used to, as Tim notes, but I'm not so sure now. AIXI works with its distribution over programs and sequences of observations, not with states of a world and its properties. If presented with a sequence of observations generated by a program, it quickly figures out what the following observations are, but it's more tricky here.
With other types of agents, we usually need to stipulate that the problem statement is somehow made clear to the agent. The way in which this could be achieved is not specified, and it seems very difficult to arrange through presenting an actual sequence of observations. So the shortcut is to draw the problem "directly" on agent's mind in terms of agent's ontology, and usually it's possible in a moderately natural way. This all takes place apart from the agent observing the state of the coin.
However in case of AIXI, it's not as clear how the elements of the problem setting should be expressed in terms of its ontology. Basically, we have two worlds corresponding to the different coin states, which could for simplicity be assumed to be generated by two programs. The first idea is to identify the programs generating these worlds with relevant AIXI's hypotheses, so that observing "tails" excludes the "heads"-programs, and therefore the "heads"-world, from consideration.
But there are many possible "tails"-programs, and AIXI's response depends on their distribution. For example, the choice of a particular "tails"-program could represent the state of other worlds. What does it say about this distribution that the problem statement was properly explained to the AIXI agent? It must necessarily be more than just observing "tails", the same as for other types of agents (if you only toss a coin and it falls "tails", this observation alone doesn't incite me to pay up). Perhaps "tails"-programs that properly model CM also imply paying the mugger.
I don't understand. Isn't the biggest missing piece (an) AIXI's precise utility function, rather than its uncertainty?