Cyan comments on Case study: abuse of frequentist statistics - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (96)
Okay, I think that makes sense. Let me put it into my own words:
The test is guaranteed to be not statistically significant merely by virtue of cutting up the outcome space into pieces, each of which has at least 5% chance of happening. And further, because the null hypothesis has been (arbitrarily) defined to be "the two methods are the same", statistical insignificance means a favorable result.
Does that about cover it? If so, that's pretty bad.
That part isn't right, but the rest is.
So I should have said "for the nine outcomes they considered, they all had at least 5% chance of happening"?
The p-value is the probability of getting a result "at least this extreme" given the null hypothesis, where "extreme" means "deviating from the null hypothesis", however that's defined. So, the test cut the outcome space into pieces, the most extreme of which had at least a 5% chance of happening.
I think.
... under the null hypothesis. I actually forgot this detail when replying to komponisto.