"Solving" Pascal's Mugging involves giving an explicit reasoning system and showing that it makes the right decision.
It's not enough to just say "your confidence has to go down more than their claimed reward goes up". That part is obvious. The hard part is coming up with actual explicit rules that do that. Particularly ones that don't fall apart in other situations (e.g. the decision system "always do nothing" can't be pascal-mugged, but has serious problems).
Another thing not addressed here is that the mugger may be a hypothetical. For example, if the AI generates hypotheses where the universe affects 3^^^^3 people then all decisions will be dominated by these hypotheses because their outcomes outweigh their prior by absurd margins. How do you detect these bad hypotheses? How do you penalize them without excluding them? Should you exclude them?
Please give a more concrete situation with actual numbers and algorithms.
I think you'll find the argument is clear without any formalization if you recognize that it is NOT the usual claim that confidence goes down. Rather, it's that the confidence falls below its contrary.
In philH's terms, you're engaging in pattern matching rather than taking the argument on its own terms.
Since Pascal’s Mugging is well known on LW, I won’t describe it at length. Suffice to say that a mugger tries to blackmail you by threatening enormous harm by a completely mysterious mechanism. If the harm is great enough, a sufficiently large threat eventually dominates doubts about the mechanism.
I have a reasonably simple solution to Pascal’s Mugging. In four steps, here it is:
Pascal’s Mugging induces us to look at the likelihood of the claim in abstraction from the fact that the claim is made. The paradox can be solved by breaking the probability that the mugger’s claim is true into two parts: the probability of the claim itself (its simplicity) and the probability that the mugger is truthful. Even if the probability of magical harm doesn’t decrease when the amount of harm increases, the probability that the mugger is truthful decreases continuously as the amount of harm predicted increases.
Solving the paradox in Pascal’s Mugging depends on recognizing that, if the logic were sound, it would engage muggers in a game where they try to pick the highest practicable number to represent the amount of harm. But this means that the higher the number, the more likely they are to be playing this game (undermining the logic believed sound).
But solving Pascal’s Mugging also depends on recognizing that the evidence that the mugger is maximizing can lower the probability below that of the same harm when no mugger has claimed it. It involves recognizing that, when it is almost certain that the claim is motivated by something unrelated to the claim’s truth, the claim can become less believable than if it hadn’t been expressed. The mugger’s maximizing motivation is evidence against his claim.
If someone presents you with a number representing the amount of threatened harm 3^3^3..., continued as long as a computer can print out when the printer is allowed for run for, say, a decade, you should think this result less probable than if someone had never presented you with the tome. While people are more likely to be telling the truth than to be lying, if you are sufficiently sure they are lying, their testimony counts against their claim.
The proof is the same as the proof of the (also counter-intuitive) proposition that failure to find (some definite amount of) evidence for a theory constitutes negative evidence. The mugger has elicited your search for evidence, but because of the mugger’s clear interest in falsehood, you find that evidence wanting.