If 3^^^^3 people are in danger, the AI wishes to believe 3^^^^3 people are in danger
This isn't about beliefs, this is about decisions. The process of epistemic rationality needn't be modified, only the process of instrumental rationality. Regardless of how much probability the AI assigns to the danger for 3^^^^3 people, it needn't be the right choice to decide based on a mere probability of such danger multiplied to the disutility of the harm done.
Saving 3^^^^3 people is more than worth a bit of vulnerability to blackmail. If 3^^^^3 people are in danger, the AI wishes to believe 3^^^^3 people are in danger and in that case "never surrender to blackmail" is a strictly worse strategy.
Unless having the decision process that surrenders to blackmail and being known to have it is what will put these people in danger in the first place. In that case, either you modify your decision process so that you precommit to not surrender to blackmail and prove it to other people in advance, or pretend to not surrender and submit to individual blackmails if enough secrecy of such submission can be ensured so that future agents won't be likely to be encouraged to blackmail.
But this was just an example of an alternate decision theory, e.g. one that had hardwired exceptions against blackmail. I'm not actually saying it need be anything as absolute or simple as that -- if it were as simple as that I'd have solved the Pascal's Mugger problem by saying "TDT plus don't submit to blackmail" instead of saying "weigh against your decision process by a factor proportional to its exploitability potential"
We seem to be thinking of slightly different problems. I wasn't thinking of the mugger's decision to blackmail you as dependent on their estimate that you will give in. There are possible muggers who will blackmail you regardless of your decision theory and refusing to submit to blackmail would cause them to produce large negative utilities.
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.