Cyan comments on Outside the Laboratory - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (336)
John, I consider myself a 'Bayesian wannabe' and my favorite author thereon is E. T. Jaynes. As such, I follow Jaynes in vehemently denying that the posterior probability following an experiment should depend on "whether Alice decided ahead of time to conduct 12 trials or decided to conduct trials until 3 successes were achieved". See Jaynes's _Probability Theory: The Logic of Science_.
The 0.05 significance level is not just "arbitrary", it is demonstrably too high - in some fields the actual majority of "statistically significant" results fail to replicate, but the failures to replicate don't get into the prestigious journals, and are not talked about and remembered.
I'm sorry, that seems just wrong. The statistics work if there's an unbiased process that determines which events you observe. If Alice conducts trails until 3 successes were achieved, that's a biased process that's sure to ensure that the data ends with a least one success.
Surely you accept that if Alice conducts 100 trials and only gives you the successes, you'll get the wrong result no matter the statistical procedure used, so you can't say that biased data collection is irrelevant. You have to either claim that continuing until 3 successes were achieved is an unbiased process, or retreat from the claim that that procedure for collecting the data does not influence the correct interpretation of the results.
If Alice decides to conduct 12 trials, then the sampling distribution of the data is the binomial distribution. If Alice decides to sample until 3 successes are achieved, then the sampling distribution of the data is the negative binomial distribution. These two distributions are proportional when considered as functions of the parameter p (i.e., as likelihood functions). So in this specific case, from a Bayesian point of view the sampling mechanism does not influence the conclusions. (This is in contradistinction to inference based on p-values.)
In general, you are correct to say that biased data collection is not irrelevant; this idea is given a complete treatment in Chapter 6 (or 7, I forget which) of Gelman et al.'s Bayesian Data Analyses, 2nd ed.