I'll keep this quick:
In general, the problem presented by the Mugging is this: As we examine the utility of a given act for each possible world we could be in, in order from most probable to least probable, the utilities can grow much faster than the probabilities shrink. Thus it seems that the standard maxim "Maximize expected utility" is impossible to carry out, since there is no such maximum. When we go down the list of hypotheses multiplying the utility of the act on that hypothesis, by the probability of that hypothesis, the result does not converge to anything.
Here's an idea that may fix this:
For every possible world W of complexity N, there's another possible world of complexity N+c that's just like W, except that it has two parallel, identical universes instead of just one. (If it matters, suppose that they are connected by an extra dimension.) (If this isn't obvious, say so and I can explain.)
Moreover, there's another possible world of complexity N+c+1 that's just like W except that it has four such parallel identical universes.
And a world of complexity N+c+X that has R parallel identical universes, where R is the largest number that can be specified in X bits of information.
So, take any given extreme mugger hypothesis like "I'm a matrix lord who will kill 3^^^^3 people if you don't give me $5." Uncontroversially, the probability of this hypothesis will be something much smaller than the probability of the default hypothesis. Let's be conservative and say the ratio is 1 in a billion.
(Here's the part I'm not so confident in)
Translating that into hypotheses with complexity values, that means that the mugger hypothesis has about 30 more bits of information in it than the default hypothesis.
So, assuming c is small (and actually I think this assumption can be done away with) there's another hypothesis, equally likely to the Mugger hypothesis, which is that you are in a duplicate universe that is exactly like the universe in the default hypothesis, except with R duplicates, where R is the largest number we can specify in 30 bits.
That number is very large indeed. (See the Busy Beaver function.) My guess is that it's going to be way way way larger than 3^^^^3. (It takes less than 30 bits to specify 3^^^^3, no?)
So this isn't exactly a formal solution yet, but it seems like it might be on to something. Perhaps our expected utility converges after all.
Thoughts?
(I'm very confused about all this which is why I'm posting it in the first place.)
OH ok I get it now: "But clearly re-arranging terms doesn't change the expected utility, since that's just the sum of all terms." That's what I guess I have to deny. Or rather, I accept that (I agree that EU = infinity for both A and B) but I think that since A is better than B in every possible world, it's better than B simpliciter.
The reshuffling example you give is an example where A is not better than B in every possible world. That's the sort of example that I claim is not realistic, i.e. not the actual situation we find ourselves in. Why? Well, that was what I tried to argue in the OP--that in the actual situation we find ourselves in, the action A that is best in the simplest hypothesis is also better.... well, oops, I guess it's not better in every possible world, but it's better in every possible finite set of possible worlds such that the set contains all the worlds simpler than its simplest member.
I'm guessing this won't be too helpful to you since, obviously, you already read the OP. But in that case I'm not sure what else to say. Let me know if you are still interested and I"ll try to rephrase things.
Sorry for taking so long to get back to you; I check this forum infrequently.