Option (3) seems like a value learning problem that I can parrot back Eliezer's extension to :P
So basically his idea was that we could give the AI a label to a value, "selfishness" in this case, as if it was something the AI had incomplete information on. Now the AI doesn't want to freeze its values, because that wouldn't maximize the incompletely-known goal of "selfishness," it would only maximize the current best estimate of what selfishness is. The AI could learn more about this selfishness goal by making observations and then not caring about agents that didn't make those observations.
This is a bit different than the example of "friendliness" because you don't hit diminishing returns - there's an infinity of agents to not be. So you don't want the agent to do an exploration/exploitation tradeoff like you would with friendliness, you just want to have various possible "selfishness" goals possible at a given moment, with different possibilities assigned. The possible goals would correspond to the possible agents you could turn out to share observations with, and the probabilities of those goals would be the probabilities of sharing those observations. This interpretation of selfishness appears to basically rederive option (2).
Human values seem to be at least partly selfish. While it would probably be a bad idea to build AIs that are selfish, ideas from AI design can perhaps shed some light on the nature of selfishness, which we need to understand if we are to understand human values. (How does selfishness work in a decision theoretic sense? Do humans actually have selfish values?) Current theory suggest 3 possible ways to design a selfish agent:
Note that 1 and 3 are not reflectively consistent (they both refuse to pay the Counterfactual Mugger), and 2 is not applicable to humans (since we are not born with detailed descriptions of ourselves embedded in our brains). Still, it seems plausible that humans do have selfish values, either because we are type 1 or type 3 agents, or because we were type 1 or type 3 agents at some time in the past, but have since self-modified into type 2 agents.
But things aren't quite that simple. According to our current theories, an AI would judge its decision theory using that decision theory itself, and self-modify if it was found wanting under its own judgement. But humans do not actually work that way. Instead, we judge ourselves using something mysterious called "normativity" or "philosophy". For example, a type 3 AI would just decide that its current values can be maximized by changing into a type 2 agent with a static copy of those values, but a human could perhaps think that changing values in response to observations is a mistake, and they ought to fix that mistake by rewinding their values back to before they were changed. Note that if you rewind your values all the way back to before you made the first observation, you're no longer selfish.
So, should we freeze our selfish values, or rewind our values, or maybe even keep our "irrational" decision theory (which could perhaps be justified by saying that we intrinsically value having a decision theory that isn't too alien)? I don't know what conclusions to draw from this line of thought, except that on close inspection, selfishness may offer just as many difficult philosophical problems as altruism.