cupholder comments on Case study: abuse of frequentist statistics - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (96)
This is going to sound silly, but...could someone explain frequentist statistics to me?
Here's my current understanding of how it works:
We've got some hypothesis H, whose truth or falsity we'd like to determine. So we go out and gather some evidence E. But now, instead of trying to quantify our degree of belief in H (given E) as a conditional probability estimate using Bayes' Theorem (which would require us to know P(H), P(E|H), and P(E|~H)), what we do is simply calculate P(E|~H) (techniques for doing this being of course the principal concern of statistics texts), and then place H into one of two bins depending on whether P(E|~H) is below some threshold number ("p-value") that somebody decided was "low": if P(E|~H) is below that number, we put H into the "accepted" bin (or, as they say, we reject the null hypothesis ~H); otherwise, we put H into the "not accepted" bin (that is, we fail to reject ~H).
Now, if that is a fair summary, then this big controversy between frequentists and Bayesians must mean that there is a sizable collection of people who think that the above procedure is a better way of obtaining knowledge than performing Bayesian updates. But for the life of me, I can't see how anyone could possibly think that. I mean, not only is the "p-value" threshold arbitrary, not only are we depriving ourselves of valuable information by "accepting" or "not accepting" a hypothesis rather than quantifying our certainty level, but...what about P(E|H)?? (Not to mention P(H).) To me, it seems blatantly obvious that an epistemology (and that's what it is) like the above is a recipe for disaster -- specifically in the form of accumulated errors over time.
I know that statisticians are intelligent people, so this has to be a strawman or something. Or at least, there must be some decent-sounding arguments that I haven't heard -- and surely there are some frequentist contrarians reading this who know what those arguments are. So, in the spirit of Alicorn's "Deontology for Cosequentialists" or ciphergoth's survey of the anti-cryonics position, I'd like to suggest a "Frequentism for Bayesians" post -- or perhaps just a "Frequentism for Dummies", if that's what I'm being here.
Not necessarily better. Just more convenient for the thumbs up/thumbs down way of looking at evidence that scientists tend to like.
It's a convention. The point is to have a pre-agreed, low significance level so that testers can't screw with the result of a test by arbitrary jacking the significance level up (if they want to reject a hypothesis) or turning it down (if they don't). The significance level has to be low to minimize the risk of a type I error.
The certainty level is effectively communicated via the significance level and p-value itself. (And the use of a reject vs. don't reject dichotomy can be desirable if one wishes to decide between performing some action and not performing it based on some data.)
A frequentist can deal in likelihoods, for example by doing hypothesis tests of likelihood ratios. As for priors, a frequentist encapsulates them in parametric and sampling assumptions about the data. A Bayesian might give a low weight to a positive result from a parapsychology study because of their "low priors", but a frequentist might complain about sampling procedures or cherrypicking being more likely than a true positive. As I see it, the two say essentially the same thing; the frequentist is just being more specific than the Bayesian.
No. P-values are not equivalent when they are calculated using different statistics, or even the same statistic but a different sample size. On the latter point see Royall, 1986.
I'd say the frequentist is using Bayesian reasoning informally; Jaynes discusses this exact problem from a Bayesian perspective at the beginning of Chapter 5 of his magnum opus.
Sorry. You are quite right, and I was sloppy. I had in mind the implicit idea that holding the choices of statistical test and data collection procedure constant, different p-values suggest how strongly one should reject the null hypothesis, and I should have made that explicit. It is absolutely true that if I just ask someone, "Test A gave me p = 0.008 and Test B gave me p = 0.4, which test's null hypothesis is worse off?", the correct answer is "how should I know?"
Yep. I think this is an example of the frequentist encapsulating what a Bayesian would call priors in their sampling assumptions.