The dangers pointed to by the thought experiment aren't restricted to exploitation by an outside entity. An AI should be able to safely consider the hypothesis "If I don't destroy my future light cone, 3^^^3 people outside the universe will be killed" regardless of where the hypothesis came from.
But even if we're just worried about mugging, how could you possibly weight it enough? Even if paying once doomed me to spend the rest of my life paying $5 to muggers, the utility calculation still works out the same way.
But even if we're just worried about mugging, how could you possibly weight it enough? Even if paying once doomed me to spend the rest of my life paying $5 to muggers, the utility calculation still works out the same way.
My idea is as follows:
Mugger: Give me 5 dollars, or I'll torture 3^^^3 sentient people across the omniverse using my undetectable magical powers.
AI: If I make my decision on this and similar trades based on a decision process DP0 of comparing the disutility(3^^^3 torture) P(you're telling the truth) compared to the disutility(giving ...
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.