Open thread, 24-30 March 2014

Metus

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

Duration set to six days to encourage Monday as first day.

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

Duration set to six days to encourage Monday as first day.

Am I confused about frequentism?

I'm currently learning about hypothesis testing in my statistics class. The idea is that you perform some test and you use the results of that test to calculate:

P(data at least as extreme as your data | Null hypothesis)

This is the p-value. If the p-value is below a certain threshold then you can reject the null hypothesis (which is the complement of the hypothesis that you are trying to test).

Put another way:

P(data | hypothesis) = 1 - p-value

and if 1 - p-value is high enough then you accept the hypothesis. (My use of "data" is handwaving and not quite correct but it doesn't matter.)

But it seems more useful to me to calculate P(hypothesis | data). And that's not quite the same thing.

So what I'm wondering is whether under frequentism P(hypothesis | data) is actually meaningless. The hypothesis is either true or false and depending on whether its true or not the data has a certain propensity of turning out one way or the other. Its meaningless to ask what the probability of the hypothesis is, you can only ask what the probability of obtaining your data is under certain assumptions.

But it seems more useful to me to calculate P(hypothesis | data). And that's not quite the same thing.

It is not the same thing and knowing P(hypothesis | data) would be very useful. Unfortunately, it is also very hard to estimate because usually the best you can do is calculate the probability, given the data, of a hypothesis out of a fixed set of hypotheses which you know about and for which you can estimate probabilities. If your understanding of the true data-generation process is not so good (which is very common in real life) your P(hypothesis | data) is going to be pretty bad and what's worse, you have no idea how bad it is.

2pcm12y

That may be true if you have little influence over what data is available. Frequentists are mainly interested in situations where they can create experiments that cause P(hypothesis) to approach 0 or 1. The p-value is intended to be good at deciding whether the hypothesis has been adequately tested, not at deciding whether to believe the hypothesis given crappy data.

1IlyaShpitser12y

It's not meaningless, but people who follow R. A. Fisher's ideas for rejecting the null do not use p(hypothesis | data). "Meaningless" would be if frequentists literally did not have p(hypothesis | data) in their language, which is not true because they use probability theory just like everybody else. ---------------------------------------- Don't ask lesswrong about what frequentists claim, ask frequentists. Very few people on lesswrong are statisticians.

11

Open thread, 24-30 March 2014

11

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

11

11

Open thread, 24-30 March 2014

11

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

11