Cyan comments on Case study: abuse of frequentist statistics - Less Wrong

25 Post author: Cyan 21 February 2010 06:35AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (96)

You are viewing a single comment's thread. Show more comments above.

Comment author: cupholder 22 February 2010 02:57:29AM 7 points [-]

I'm not seeing why what you call "the real WTF" is evidence of a problem with frequentist statistics. The fact that the hypothesis test would have given a statistically insignificant p-value whatever the actual 6 data points were just indicates that whatever the population distributions, 6 data points are simply not enough to disconfirm the null hypothesis. In fact you can see this if you look at Mann & Whitney's original paper! (See the n=3 subtable in table I, p. 52.)

I can picture someone counterarguing that this is not immediately obvious from the details of the statistical test, but I would hope that any competent statistician, frequentist or not, would be sceptical of a nonparametric comparison of means for samples of size 3!

Comment author: Cyan 22 February 2010 03:26:57AM *  1 point [-]

Thanks for the pointer to the original paper.

I'm not seeing why what you call "the real WTF" is evidence of a problem with frequentist statistics.

Check out the title: abuse of frequentist statistics. Yes, at the end, I argue from a Bayesian perspective, but you don't have to be a Bayesian to see the structural problems with frequentist statistics as currently taught to and practiced by working scientists.

I would hope that any competent statistician, frequentist or not, would be sceptical of a nonparametric comparison of means for samples of size 3!

Me too. But not all papers with shoddy statistics are sent to statisticians for review. Experimental biologists in particular have a reputation for math-phobia. (Does the fact that when I saw the sample size the word "underpowered" instantly jumped into my head count as evidence that I am competent?)

Comment author: cupholder 22 February 2010 03:49:59AM 12 points [-]

Check out the title: abuse of frequentist statistics. Yes, at the end, I argue from a Bayesian perspective, but you don't have to be a Bayesian to see the structural problems with frequentist statistics as currently taught to and practiced by working scientists.

I agree that frequentist statistics are often poorly taught and understood, and that this holds however you like to do your statistics. Still, the main post feels to me like a sales pitch for Bayes brand chainsaws that's trying to scare me off Neyman-Pearson chainsaws by pointing out how often people using Neyman-Pearson chainsaws accidentally cut off a limb with them. (I am aware that I may be the only reader who feels this way about the post.)

(Does the fact that when I saw the sample size the word "underpowered" instantly jumped into my head count as evidence that I am competent?)

Yes, but it is not sufficient evidence to reject the null hypothesis of incompetence at the 0.05 significance level. (I keed, I keed.)

Comment author: thomblake 22 February 2010 01:50:20PM 4 points [-]

a sales pitch for Bayes brand chainsaws

I get that impression a lot around here

Comment author: Cyan 22 February 2010 04:17:04AM *  2 points [-]

Still, the main post feels to me like a sales pitch...

It's a fair point; I'm not exactly attacking the strongest representative of frequentist statistical practice. My only defense is that this actually happened, so it makes a good case study.

Comment author: cupholder 22 February 2010 04:40:54AM 2 points [-]

That's true, and having been reminded of that, I think I may have been unduly pedantic about a fine detail at the expense of the main point.

Comment author: PhilGoetz 25 February 2010 02:25:43PM 0 points [-]

It's a good case study, but it's not evidence of a problem with frequentist statistics.

Comment author: Cyan 25 February 2010 02:36:05PM *  0 points [-]

I assert that it is evidence in my concluding paragraph, but it's true that I don't give an actual argument. Whether one counts it as evidence would seem to depend on the causal assumptions one makes about the teaching and practice of statistics.

Comment author: PhilGoetz 25 February 2010 10:25:58PM 1 point [-]

Perhaps it's frequentist evidence against frequentist statistics.

Comment author: Cyan 26 February 2010 12:30:07AM 1 point [-]

I think this is just a glib rejoinder, but if there's a deeper thought there, I'd be interested to hear it.

Comment author: PhilGoetz 27 February 2010 04:04:02AM *  2 points [-]

The critique of frequentist statistics, as I understand it - and I don't think I do - is that frequentists like to count things, and trust that having large sample sizes will take care of biases for them. Therefore, a case in which frequentist statistics co-occurs with bad results counts against use of frequentist statistics, and you don't have to worry about why the results were bad.

The whole Bayesian vs. frequentist argument seems a little silly to me. It's like arguing that screws are better than nails. It's true that, for any particular individual joint you wish to connect, a screw will probably connect it more securely and reversibly than a nail. That doesn't mean there's no use for nails.

Comment author: brian_jaress 23 February 2010 06:02:49PM 3 points [-]

I think that, in this case, the underlying problem was not caused by the way frequentist statistics are commonly taught and practiced by working scientists:

In the present case, the null hypothesis is that the old method and the new method produce data from the same distribution; the authors would like to see data that do not lead to rejection of the null hypothesis.

I'm no statistician, but I'm pretty sure you're not supposed to make your favored hypothesis the null hypothesis. That's a pretty simple rule and I think it's drilled into students and enforced in peer review.

I see that as the underlying problem because it reverses the burden of proof. If they had done it the right way around, six data points would have been not enough to support their method instead of being not enough to reject it. Making your favored hypothesis the null hypothesis can allow you, in the extreme, to rely on a single data point.

Comment author: Cyan 23 February 2010 06:18:08PM *  1 point [-]

In the OP I did refer to that when I wrote:

Now even from a frequentist perspective, this is wacky. Hypothesis testing can reject a null hypothesis, but cannot confirm it, as discussed in the first paragraph of the Wikipedia article on null hypotheses.

You wrote:

That's a pretty simple rule and I think it's drilled into students and enforced in peer review.

Not all papers are reviewed by people who know the rule. I was taught that rule over ten years ago, and I didn't remember it when my colleague showed me the analysis. (I did recall it eventually, just after I ran the sanity check. Evidence against my competence!) My colleague whose job it was to review the paper didn't know/recall the rule either.

Comment author: PhilGoetz 25 February 2010 02:09:39PM *  -1 points [-]

Check out the title: abuse of frequentist statistics. Yes, at the end, I argue from a Bayesian perspective, but you don't have to be a Bayesian to see the structural problems with frequentist statistics as currently taught to and practiced by working scientists.

Well, I don't see the structural problems. (I don't even know what a structural problem is.)

Somebody, please write a top-level post addressing this. Stop saying "Frequentists are bad" and leaving it at that. This is a great story; but it's not valid argumentation to try to convert it into an anti-frequentist tract.

Comment author: Kevin 25 February 2010 02:18:42PM *  1 point [-]

I'd love to see a top-level post where someone suggests the best and/or most realistic way for scientists to do their statistics. I'm actually rather ignorant with regards to probability theory. I got a D in second semester frequentist statistics (hard teacher + I didn't go to class or try very hard on the homework) which is indicative of how little I learned in that class. I did better in my applied statistics classes.

When is it good for scientists to do null hypothesis testing?

Comment author: Cyan 25 February 2010 02:13:08PM *  0 points [-]

What specifically is the "this" you want addressed? I'm not sure what its referent is.