In my previous post, I alluded to a result that could potentially convince a frequentist to favor Bayesian posterior distributions over confidence intervals. It’s called the complete class theorem, due to a statistician named Abraham Wald. Wald developed the structure of frequentist decision theory and characterized the class of decision rules that have a certain optimality property.
Frequentist decision theory reduces the decision process to its basic constituents, i.e., data, actions, true states, and incurred losses. It connects them using mathematical functions that characterize their dependencies, i.e., the true state determines the probability distribution of the data, the decision rule maps data to a particular action, and the chosen action and true states together determine the incurred loss. To evaluate potential decision rules, frequentist decision theory uses the risk function, which is defined as the expected loss of a decision rule with respect to the data distribution. The risk function therefore maps (decision rule, true state)-pairs to the average loss under a hypothetical infinite replication of the decision problem.
Since the true state is not known, decision rules must be evaluated over all possible true states. A decision rule is said to be “dominated” if there is another decision rule whose risk is never worse for any possible true state and is better for at least one true state. A decision rule which is not dominated is deemed “admissible”. (This is the optimality property alluded to above.) The punch line is that under some weak conditions, the complete class of admissible decision rules is precisely the class of rules which minimize a Bayesian posterior expected loss.
(This result sparked interest in the Bayesian approach among statisticians in the 1950s. This interest eventually led to the axiomatic decision theory that characterizes rational agents as obeying certain fundamental constraints and proves that they act as if they had a prior distribution and a loss function.)
Taken together, the calibration results of the previous post and the complete class theorem suggest (to me, anyway) that irrespective of one's philosophical views on frequentism versus Bayesianism, perfect calibration is not possible in full generality for a rational decision-making agent.
The absence of comments here doesn't reflect well on us, but this is a tricky topic. I'm honestly trying to get to the bottom of this and the bottom ain't in sight yet.
EDIT: I'm not sure a prior that matched confidence intervals would be a good thing. See point III.b "Truncated exponential distribution" in this pdf for an example where a 90% confidence interval gives a result that's actually logically ruled out by the sample. (Cyan, am I restating obvious stuff? Too stupid to say for sure yet.)
To be honest, I'm not shocked that most people aren't equipped to or interested in grappling with this stuff. If I weren't a Bayesian working for a frequentist I wouldn't be thinking so much about why frequentists do what they do. I was hoping that the more mathematically inclined folks would find this argument startling enough to try to knock it down -- I'd be happy to be wrong.
It isn't so much that we want posterior intervals to match some crappy-arsed confidence interval. We just want them to be calibrated, and as near as I can tell, calibration is equi... (read more)