There once lived a great man named E.T. Jaynes. He knew that Bayesian inference is the only way to do statistics logically and consistently, standing on the shoulders of misunderstood giants Laplace and Gibbs. On numerous occasions he vanquished traditional "frequentist" statisticians with his superior math, demonstrating to anyone with half a brain how the Bayesian way gives faster and more correct results in each example. The weight of evidence falls so heavily on one side that it makes no sense to argue anymore. The fight is over. Bayes wins. The universe runs on Bayes-structure.
Or at least that's what you believe if you learned this stuff from Overcoming Bias.
Like I was until two days ago, when Cyan hit me over the head with something utterly incomprehensible. I suddenly had to go out and understand this stuff, not just believe it. (The original intention, if I remember it correctly, was to impress you all by pulling a Jaynes.) Now I've come back and intend to provoke a full-on flame war on the topic. Because if we can have thoughtful flame wars about gender but not math, we're a bad community. Bad, bad community.
If you're like me two days ago, you kinda "understand" what Bayesians do: assume a prior probability distribution over hypotheses, use evidence to morph it into a posterior distribution over same, and bless the resulting numbers as your "degrees of belief". But chances are that you have a very vague idea of what frequentists do, apart from deriving half-assed results with their ad hoc tools.
Well, here's the ultra-short version: frequentist statistics is the art of drawing true conclusions about the real world instead of assuming prior degrees of belief and coherently adjusting them to avoid Dutch books.
And here's an ultra-short example of what frequentists can do: estimate 100 independent unknown parameters from 100 different sample data sets and have 90 of the estimates turn out to be true to fact afterward. Like, fo'real. Always 90% in the long run, truly, irrevocably and forever. No Bayesian method known today can reliably do the same: the outcome will depend on the priors you assume for each parameter. I don't believe you're going to get lucky with all 100. And even if I believed you a priori (ahem) that don't make it true.
(That's what Jaynes did to achieve his awesome victories: use trained intuition to pick good priors by hand on a per-sample basis. Maybe you can learn this skill somewhere, but not from the Intuitive Explanation.)
How in the world do you do inference without a prior? Well, the characterization of frequentist statistics as "trickery" is totally justified: it has no single coherent approach and the tricks often give conflicting results. Most everybody agrees that you can't do better than Bayes if you have a clear-cut prior; but if you don't, no one is going to kick you out. We sympathize with your predicament and will gladly sell you some twisted technology!
Confidence intervals: imagine you somehow process some sample data to get an interval. Further imagine that hypothetically, for any given hidden parameter value, this calculation algorithm applied to data sampled under that parameter value yields an interval that covers it with probability 90%. Believe it or not, this perverse trick works 90% of the time without requiring any prior distribution on parameter values.
Unbiased estimators: you process the sample data to get a number whose expectation magically coincides with the true parameter value.
Hypothesis testing: I give you a black-box random distribution and claim it obeys a specified formula. You sample some data from the box and inspect it. Frequentism allows you to call me a liar and be wrong no more than 10% of the time reject truthful claims no more than 10% of the time, guaranteed, no prior in sight. (Thanks Eliezer for calling out the mistake, and conchis for the correction!)
But this is getting too academic. I ought to throw you dry wood, good flame material. This hilarious PDF from Andrew Gelman should do the trick. Choice quote:
Well, let me tell you something. The 50 states aren't exchangeable. I've lived in a few of them and visited nearly all the others, and calling them exchangeable is just silly. Calling it a hierarchical or multilevel model doesn't change things - it's an additional level of modeling that I'd rather not do. Call me old-fashioned, but I'd rather let the data speak without applying a probability distribution to something like the 50 states which are neither random nor a sample.
As a bonus, the bibliography to that article contains such marvelous titles as "Why Isn't Everyone a Bayesian?" And Larry Wasserman's followup is also quite disturbing.
Another stick for the fire is provided by Shalizi, who (among other things) makes the correct point that a good Bayesian must never be uncertain about the probability of any future event. That's why he calls Bayesians "Often Wrong, Never In Doubt":
The Bayesian, by definition, believes in a joint distribution of the random sequence X and of the hypothesis M. (Otherwise, Bayes's rule makes no sense.) This means that by integrating over M, we get an unconditional, marginal probability for f.
For my final quote it seems only fair to add one more polemical summary of Cyan's point that made me sit up and look around in a bewildered manner. Credit to Wasserman again:
Pennypacker: You see, physics has really advanced. All those quantities I estimated have now been measured to great precision. Of those thousands of 95 percent intervals, only 3 percent contained the true values! They concluded I was a fraud.
van Nostrand: Pennypacker you fool. I never said those intervals would contain the truth 95 percent of the time. I guaranteed coherence not coverage!
Pennypacker: A lot of good that did me. I should have gone to that objective Bayesian statistician. At least he cares about the frequentist properties of his procedures.
van Nostrand: Well I'm sorry you feel that way Pennypacker. But I can't be responsible for your incoherent colleagues. I've had enough now. Be on your way.
There's often good reason to advocate a correct theory over a wrong one. But all this evidence (ahem) shows that switching to Guardian of Truth mode was, at the very least, premature for me. Bayes isn't the correct theory to make conclusions about the world. As of today, we have no coherent theory for making conclusions about the world. Both perspectives have serious problems. So do yourself a favor and switch to truth-seeker mode.
Finally, the electron is found at some certain polarisation. You just don't know which before actually doing the experiment (same as for the coin) and you can't make in principle (at least according to present model of physics - don't forget that non-local hidden variables are not ruled out) any observation which tells you the result with more certainty in advance (for coin you can). So, the difference is that the future of a classical system can be predicted with unlimited certainty from its present state, while for quantum system not so. This doesn't necessarily mean that the future is not determined. One can adopt the viewpoint (I think that it was even suggested on OB/LW in Eliezer's posts about timeless physics) that future is symmetric to the past - it exists in the whole history of universe, and if we don't know it now, it's our ignorance. I suppose you would agree that not knowing about the electron's past is a matter of our ignorance rather than a property of the electron itself, without regard to whether we are able to calculate it from presently available information, even in principle (i.e. using present theories).
I also think that it has little merit to engage in discussions about terminology and this one tends in that direction. Practically there's no difference between saying that quantum probabilities are "properties of the system" or "of the predictor". Either we can predict, or not, and that's all what matters. Beware of the clause "in principle", as it often only obscures the debate.
Edit: to formulate it a little bit differently, predictability is an instance of regularity in the universe, i.e. our ability to compress the data of the whole history of the universe into some brief set of laws and possibly not so brief set of initial conditions, nevertheless much smaller amount of information that the history of the universe recorded at each point and time instant. As we do not have this huge pack of information and thus can't say to what extent it is compressible, we use theories that are based much on induction, which itself is a particular bias. We don't know even whether the theories we use apply at any time and place, of for any system universally. Frequentist seem to distinguish this uncertainty - which they largely ignore in practice - from uncertainty as a property of the system. So, as I understand the state of affairs, a frequentist is satisfied with a theory (which is a comprimation algorithm applicable to the information about the universe) which includes calling the random number generator at some occasions (e.g. when dealing with dice or electrons), and such induced uncertainty he calls "property of the system". On the other hand, the uncertainty about the theory itself is a different kind of "meta-uncertainty".
The Bayesian approach seems to me more elegant (and Occam-razor friendly) as it doesn't introduce different sorts of uncertainties. It also fits better with the view of physical laws as comprimation algorithms, as it doesn't distinguish between data and theories with regard to their uncertainty. One may just accept that the history of universe needn't be compressible to data available at the moment, and use induction to estimate future states of the world in the same way as one estimates limits of validity of presently formulated physical laws.