How does using priors affect the concept of statistical significance? The scientific convention is to use a 5% threshold for significance, no matter whether the hypothesis has been given a low or a high prior probability.

If we momentarily disregard the fact that there might be general methodological issues with using statistical significance, how does the use of priors specifically affect the appropriateness of using statistical significance?

New Comment
7 comments, sorted by Click to highlight new comments since:

Statistical-significance is about long-run rejection rates, which have no direct connection to priors or probabilities; the idea is simply to define a procedure which has certain properties when repeated many times.

If you introduce priors and power then you can see what sorts of probabilities come out when you calculate the errors, and you can see that with low prior probability claims a p<0.05 means many more mistakes than with high prior probability claims. This is what the famous "Why most published research findings are false" paper is all about. It's the same thing as the classic cancer screening graphs like in http://www.nature.com/news/scientific-method-statistical-errors-1.14700 or Yudkowsky's tutorial: you can measure data (the mammography) with a false positive rate of 5% (5% of the time where someone has no cancer, the mammography reports cancer), but that doesn't mean there's a 95% chance they have cancer...

See also the papers linked in http://lesswrong.com/lw/g13/against_nhst/

That said, if you want to interpret p-values as something more interesting like Ioannidis's PPV or posterior probabilities, you can interpret them as Bayes factors in a hypothesis testing approach; see "Calibration of p Values for Testing Precise Null Hypotheses", Selke et al 2001 or for a more readable explanation, http://www.nature.com/nrg/journal/v10/n10/box/nrg2615_BX2.html , which explains that the Bayes factor must be less than -1/(e * p * log(p)) or .

If I've understood it right, a p=0.05 result would imply that the Bayes factor is < -1/(2.71828 0.05 log(0.05)) or <2.45. So if you previously gave 1:100 odds, then upon observing p=0.05, you'd then give <2.45:100 or moved from 1% to <2.45%. If the result had been p=0.01, then the BF would be <7.9 so you'd instead be at a posterior of <7.9%. Given these two examples, you can see how weak p-values are at the usual significance levels.

I find it helpful to interpret p=0.05 as indicating tripling odds and 0.01 as sextupling; it makes it much more intuitive like unexpected results fail to replicate so often - eg if you know that ~1% of drugs survive clinical trials, then a 3x is not that impressive and you can temper your enthusiasm appropriately. (What's nice is that this is so easy to do. You don't have to perform your own little meta-analysis which half the time you can't do because key data wasn't included, you don't have to write anything in JAGS, you don't have to even run any precanned routines in packages like BayesFactor. p<0.05? 3x. p<0.01? 6x.)

They are different concepts, either you use statistical significance or you do Bayesian updating (ie. using priors):

If you are using a 5% threshold roughly speaking this means that you will accept a hypothesis if the chance of getting equally strong data if your hypothesis is false is 5% or less.

If you are doing Bayesian updating you start with a probability for how likely a statement is (this is your prior) and update based on how likely your data would be if your statement was true or false.

here is an xkcd which highlights the difference: https://xkcd.com/1132/

here is an xkcd which highlights the difference: https://xkcd.com/1132/

Sweet!

[+][anonymous]-70