I'll keep this quick:
In general, the problem presented by the Mugging is this: As we examine the utility of a given act for each possible world we could be in, in order from most probable to least probable, the utilities can grow much faster than the probabilities shrink. Thus it seems that the standard maxim "Maximize expected utility" is impossible to carry out, since there is no such maximum. When we go down the list of hypotheses multiplying the utility of the act on that hypothesis, by the probability of that hypothesis, the result does not converge to anything.
Here's an idea that may fix this:
For every possible world W of complexity N, there's another possible world of complexity N+c that's just like W, except that it has two parallel, identical universes instead of just one. (If it matters, suppose that they are connected by an extra dimension.) (If this isn't obvious, say so and I can explain.)
Moreover, there's another possible world of complexity N+c+1 that's just like W except that it has four such parallel identical universes.
And a world of complexity N+c+X that has R parallel identical universes, where R is the largest number that can be specified in X bits of information.
So, take any given extreme mugger hypothesis like "I'm a matrix lord who will kill 3^^^^3 people if you don't give me $5." Uncontroversially, the probability of this hypothesis will be something much smaller than the probability of the default hypothesis. Let's be conservative and say the ratio is 1 in a billion.
(Here's the part I'm not so confident in)
Translating that into hypotheses with complexity values, that means that the mugger hypothesis has about 30 more bits of information in it than the default hypothesis.
So, assuming c is small (and actually I think this assumption can be done away with) there's another hypothesis, equally likely to the Mugger hypothesis, which is that you are in a duplicate universe that is exactly like the universe in the default hypothesis, except with R duplicates, where R is the largest number we can specify in 30 bits.
That number is very large indeed. (See the Busy Beaver function.) My guess is that it's going to be way way way larger than 3^^^^3. (It takes less than 30 bits to specify 3^^^^3, no?)
So this isn't exactly a formal solution yet, but it seems like it might be on to something. Perhaps our expected utility converges after all.
Thoughts?
(I'm very confused about all this which is why I'm posting it in the first place.)
If someone claims something about modest numbers there is little need to differentiate between the scenario described and the utterance being evidence for that kind of scenario to hold.
To me its not that the there is only a certain amount of threat per letter you can use (where 3^^^3 tries to be efficient) but the communicative details of the threat lose signficance in the limit.
Its about how much credible threat can be conveyed in a speech bubble. And I don't think that has the form of "well that depends on how many characters the bubble can fill". One does not up their threat level by being able to say large numbers and then saying "I am going to hurt you X much". At the limit when your act would register in my mind as you credibly speaking a big threat would be hardly recognised as speaking any more. Its the point of instead of making air vibrate, you initiate supernovas to make a point.