Can't an AI escape the dangers of Pascal's Mugging by having a decision theory that weighs against having exploitable decision theories according to the measure of their exploitability?
The dangers pointed to by the thought experiment aren't restricted to exploitation by an outside entity. An AI should be able to safely consider the hypothesis "If I don't destroy my future light cone, 3^^^3 people outside the universe will be killed" regardless of where the hypothesis came from.
But even if we're just worried about mugging, how could you possibly weight it enough? Even if paying once doomed me to spend the rest of my life paying $5 to muggers, the utility calculation still works out the same way.
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.