Expected utility can be expressed as the sum ΣP(Xn)U(Xn). Suppose P(Xn) = 2-n, and U(Xn) = (-2)n/n. Then expected utility = Σ2-n(-2)n/n = Σ(-1)n/n = -1+1/2-1/3+1/4-... = -ln(2). Except there's no obvious order to add it. You could just as well say it's -1+1/2+1/4+1/6+1/8-1/3+1/10+1/12+1/14+1/16-1/5+... = 0. The sum depends on the order you add it. This is known as conditional convergence.
This is clearly something we want to avoid. Suppose my priors have an unconditionally convergent expected utility. This would mean that ΣP(Xn)|U(Xn)| converges. Now suppose I observe evidence Y. ΣP(Xn|Y)|U(Xn)| = Σ|U(Xn)|P(Xn∩Y)/P(Y) ≤ Σ|U(Xn)|P(Xn)/P(Y) = 1/P(Y)·ΣP(Xn)|U(Xn)|. As long as P(Y) is nonzero, this must also converge.
If my prior expected utility is unconditionally convergent, then given any finite amount of evidence, so is my posterior.
This means I only have to come up with a nice prior, and I'll never have to worry about evidence braking expected utility.
I suspect that this can be made even more powerful, and given any amount of evidence, finite or otherwise, I will almost surely have an unconditionally convergent posterior. Anyone want to prove it?
Now let's look at Pascal's Mugging. The problem here seems to be that someone could very easily give you an arbitrarily powerful threat. However, in order for expected utility to converge unconditionally, either carrying out the threat must get unlikely faster than the disutility increases, or the probability of the threat itself must get unlikely that fast. In other words, either someone threatening 3^^^3 people is so unlikely to carry it out to make it non-threatening, or the threat itself must be so difficult to make that you don't have to worry about it.
I totally agree and never meant to imply otherwise. But just as any consistent system of degrees of belief can be put into correspondence with the axioms of probability, so there are certain stipulations about what can reasonably called a utility function.
I would argue that if you meet a conscious agent and your model of their utility function says that it doesn't converge (in the appropriate L1 norm of the appropriate modeled probability space) then something's wrong with that model of utility function... not with the assumption that utility functions should converge. There are many subtleties, I'm sure, but non-integrable utility functions seem futile to me. If something can be well-modeled by a non-integrable utility function, then I'm fine updating my position, but in years of learning and teaching probability theory, I've never encountered anything that would convince me of that.
Doesn't this all assume that utility functions are real-valued?