Frequentist Magic vs. Bayesian Magic
[I posted this to open thread a few days ago for review. I've only made some minor editorial changes since then, so no need to read it again if you've already read the draft.]
This is a belated reply to cousin_it's 2009 post Bayesian Flame, which claimed that frequentists can give calibrated estimates for unknown parameters without using priors:
And here's an ultra-short example of what frequentists can do: estimate 100 independent unknown parameters from 100 different sample data sets and have 90 of the estimates turn out to be true to fact afterward. Like, fo'real. Always 90% in the long run, truly, irrevocably and forever.
And indeed they can. Here's the simplest example that I can think of that illustrates the spirit of frequentism:
Suppose there is a machine that produces biased coins. You don't know how the machine works, except that each coin it produces is either biased towards heads (in which case each toss of the coin will land heads with probability .9 and tails with probability .1) or towards tails (in which case each toss of the coin will land tails with probability .9 and heads with probability .1). For each coin, you get to observe one toss, and then have to state whether you think it's biased towards heads or tails, and what is the probability that's the right answer.
Let's say that you decide to follow this rule: after observing heads, always answer "the coin is biased towards heads with probability .9" and after observing tails, always answer "the coin is biased towards tails with probability .9". Do this for a while, and it will turn out that 90% of the time you are right about which way the coin is biased, no matter how the machine actually works. The machine might always produce coins biased towards heads, or always towards tails, or decide based on the digits of pi, and it wouldn't matter—you'll still be right 90% of the time. (To verify this, notice that in the long run you will answer "heads" for 90% of the coins actually biased towards heads, and "tails" for 90% of the coins actually biased towards tails.) No priors needed! Magic!
Case study: abuse of frequentist statistics
Recently, a colleague was reviewing an article whose key justification rested on some statistics that seemed dodgy to him, so he came to me for advice. (I guess my boss, the resident statistician, was out of his office.) Now, I'm no expert in frequentist statistics. My formal schooling in frequentist statistics comes from my undergraduate chemical engineering curriculum -- I wouldn't rely on it for consulting. But I've been working for someone who is essentially a frequentist for a year and a half, so I've had some hands-on experience. My boss hired me on the strength of my experience with Bayesian statistics, which I taught myself in grad school, and one thing reading the Bayesian literature voraciously will equip you for is critiquing frequentist statistics. So I felt competent enough to take a look.1
Frequentist Statistics are Frequently Subjective
Andrew Gelman recently responded to a commenter on the Yudkowsky/Gelman diavlog; the commenter complained that Bayesian statistics were too subjective and lacked rigor. I shall explain why this is unbelievably ironic, but first, the comment itself:
However, the fundamental belief of the Bayesian interpretation, that all probabilities are subjective, is problematic -- for its lack of rigor... One of the features of frequentist statistics is the ease of testability. Consider a binomial variable, like the flip of a fair coin. I can calculate that the probability of getting seven heads in ten flips is 11.71875%... At some point a departure from the predicted value may appear, and frequentist statistics give objective confidence intervals that can precisely quantify the degree to which the coin departs from fairness...
Gelman's first response is "Bayesian probabilities don't have to be subjective." Not sure I can back him on that; probability is ignorance and ignorance is a state of mind (although indeed, some Bayesian probabilities can correspond very directly to observable frequencies in repeatable experiments).
My own response is that frequentist statistics are far more subjective than Bayesian likelihood ratios. Exhibit One is the notion of "statistical significance" (which is what the above comment is actually talking about, although "confidence intervals" have almost the same problem). Steven Goodman offers a nicely illustrated example: Suppose we have at hand a coin, which may be fair (the "null hypothesis") or perhaps biased in some direction. So lo and behold, I flip the coin six times, and I get the result TTTTTH. Is this result statistically significant, and if so, what is the p-value - that is, the probability of obtaining a result at least this extreme?
Well, that depends. Was I planning to flip the coin six times, and count the number of tails? Or was I planning to flip the coin until it came up heads, and count the number of trials? In the first case, the probability of getting "five tails or more" from a fair coin is 11%, while in the second case, the probability of a fair coin requiring "at least five tails before seeing one heads" is 3%.
Whereas a Bayesian looks at the experimental result and says, "I can now calculate the likelihood ratio (evidential flow) between all hypotheses under consideration. Since your state of mind doesn't affect the coin in any way - doesn't change the probability of a fair coin or biased coin producing this exact data - there's no way your private, unobservable state of mind can affect my interpretation of your experimental results."
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)