In my previous post, I alluded to a result that could potentially convince a frequentist to favor Bayesian posterior distributions over confidence intervals. It’s called the complete class theorem, due to a statistician named Abraham Wald. Wald developed the structure of frequentist decision theory and characterized the class of decision rules that have a certain optimality property.
Frequentist decision theory reduces the decision process to its basic constituents, i.e., data, actions, true states, and incurred losses. It connects them using mathematical functions that characterize their dependencies, i.e., the true state determines the probability distribution of the data, the decision rule maps data to a particular action, and the chosen action and true states together determine the incurred loss. To evaluate potential decision rules, frequentist decision theory uses the risk function, which is defined as the expected loss of a decision rule with respect to the data distribution. The risk function therefore maps (decision rule, true state)-pairs to the average loss under a hypothetical infinite replication of the decision problem.
Since the true state is not known, decision rules must be evaluated over all possible true states. A decision rule is said to be “dominated” if there is another decision rule whose risk is never worse for any possible true state and is better for at least one true state. A decision rule which is not dominated is deemed “admissible”. (This is the optimality property alluded to above.) The punch line is that under some weak conditions, the complete class of admissible decision rules is precisely the class of rules which minimize a Bayesian posterior expected loss.
(This result sparked interest in the Bayesian approach among statisticians in the 1950s. This interest eventually led to the axiomatic decision theory that characterizes rational agents as obeying certain fundamental constraints and proves that they act as if they had a prior distribution and a loss function.)
Taken together, the calibration results of the previous post and the complete class theorem suggest (to me, anyway) that irrespective of one's philosophical views on frequentism versus Bayesianism, perfect calibration is not possible in full generality for a rational decision-making agent.
The classic answer is that your confidence intervals are liable to occasionally tell you that mass is a negative number, when a large error occurs. Is this interval allowing only negative masses, 90% likely to be correct? No, even if you used an experimental method that a priori was 90% likely to yield an interval covering the correct answer. In other words, using the confidence interval as the posterior probability and plugging it into the expected-utility decision function doesn't make sense. Frequentists think that ignoring this problem means it goes away.
They don't.