Eliezer_Yudkowsky comments on Case study: abuse of frequentist statistics - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (96)
This is going to sound silly, but...could someone explain frequentist statistics to me?
Here's my current understanding of how it works:
We've got some hypothesis H, whose truth or falsity we'd like to determine. So we go out and gather some evidence E. But now, instead of trying to quantify our degree of belief in H (given E) as a conditional probability estimate using Bayes' Theorem (which would require us to know P(H), P(E|H), and P(E|~H)), what we do is simply calculate P(E|~H) (techniques for doing this being of course the principal concern of statistics texts), and then place H into one of two bins depending on whether P(E|~H) is below some threshold number ("p-value") that somebody decided was "low": if P(E|~H) is below that number, we put H into the "accepted" bin (or, as they say, we reject the null hypothesis ~H); otherwise, we put H into the "not accepted" bin (that is, we fail to reject ~H).
Now, if that is a fair summary, then this big controversy between frequentists and Bayesians must mean that there is a sizable collection of people who think that the above procedure is a better way of obtaining knowledge than performing Bayesian updates. But for the life of me, I can't see how anyone could possibly think that. I mean, not only is the "p-value" threshold arbitrary, not only are we depriving ourselves of valuable information by "accepting" or "not accepting" a hypothesis rather than quantifying our certainty level, but...what about P(E|H)?? (Not to mention P(H).) To me, it seems blatantly obvious that an epistemology (and that's what it is) like the above is a recipe for disaster -- specifically in the form of accumulated errors over time.
I know that statisticians are intelligent people, so this has to be a strawman or something. Or at least, there must be some decent-sounding arguments that I haven't heard -- and surely there are some frequentist contrarians reading this who know what those arguments are. So, in the spirit of Alicorn's "Deontology for Cosequentialists" or ciphergoth's survey of the anti-cryonics position, I'd like to suggest a "Frequentism for Bayesians" post -- or perhaps just a "Frequentism for Dummies", if that's what I'm being here.
No no no. That would be a hundred times saner than frequentism. What you actually do is take the real data e-12 and put it into a giant bin E that also contains e-1, e-3, and whatever else you can make up a plausible excuse to include or exclude, and then you calculate P(E|~H). This is one of the key points of flexibility that enables frequentists to get whatever answer they like, the other being the choice of control variables in multivariate analyses.
See e.g. this part of the article:
This seems to use "frequentist" to mean "as statistics are actually practiced." It is unreasonable to compare the implementation of A to the ideal form of B. In particular, the problem of the Mann-Whitney test seem to me that the authors looked up a recipe in a cookbook without understanding it, which they could have done just as easily in a bayesian cookbook.
Can you elaborate on that?
Well, the blatant version would be to take 5 possible control variables and try all 32 possible omissions and inclusions to see if any of the combinations turns up "statistically significant". This might look a little suspicious if you collected the data and then threw some of it away. If you were running regressions on an existing database with lots of potential control variables, why, they'll just have to trust that you never secretly picked and chose.
Someone who did that might not be able to convince themselves they weren't cheating... but someone who, somehow or other, got an idea of which variables would be most convenient to control for, might well find themselves influenced just a bit in that direction.
I don't see how being a Bayesian gets you out of cherry-picking your causal structure from a large set. You still have to decide which variables are conditional on which other variables.
You put in all the variables, use a hierarchical structure for the prior, use a weakly informative hyperprior, and let the data sort itself out if it can. Key phrase: automatic relevance determination; David MacKay originated the term while doing Bayesian inference for neural nets.
Is that a 'were not'?