The idea is to compare not the results of actions, but the results of decision algorithms. The question that the agent should ask itself is thus:
"Suppose everyone1 who runs the same thinking procedure like me uses decision algorithm X. What utility would I get at the 50th percentile (not: what expected utility should I get), after my life is finished?"
Then, he should of course look for the X that maximizes this value.
Now, if you formulate a turing-complete "decision algorithm", this heads into an infinite loop. But suppose that "decision algorithm" is defined as a huge table for lots of different possible situations, and the appropriate outputs.
Let's see what results such a thing should give:
- If the agent has the possibility to play a gamble, and the probabilities involved are not small, and he expects to be allowed to play many gambles like this in the future, he should decide exactly as if he was maximizing expected utility: If he has made many decisions like this, he will get a positive utility difference in the 50th percentile if and only if his expected utility from playing the gamble is positive.
- However, if Pascal's mugger comes along, he will decline: The complete probability of living in a universe where people like this mugger ought to be taken seriously is small. In the probability distribution over expected utility at the end of the agent's lifetime, the possibility of getting tortured will manifest itself only very slightly at the 50th percentile - much less than the possibility of losing 5 Dollars.
The reason why humans will intuitively decline to give money to the mugger might be similar: They imagine not the expected utility with both decisions, but the typical outcome of giving the mugger some money, versus declining to.
1I say this to make agents of the same type cooperate in prisoner-like dilemmas.