Cyan comments on Outside the Laboratory - Less Wrong

63 Post author: Eliezer_Yudkowsky 21 January 2007 03:46AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (336)

Sort By: Old

You are viewing a single comment's thread. Show more comments above.

Comment author: TimFreeman 28 May 2011 08:18:00PM -2 points [-]

I follow Jaynes in vehemently denying that the posterior probability following an experiment should depend on "whether Alice decided ahead of time to conduct 12 trials or decided to conduct trials until 3 successes were achieved".

I'm sorry, that seems just wrong. The statistics work if there's an unbiased process that determines which events you observe. If Alice conducts trails until 3 successes were achieved, that's a biased process that's sure to ensure that the data ends with a least one success.

Surely you accept that if Alice conducts 100 trials and only gives you the successes, you'll get the wrong result no matter the statistical procedure used, so you can't say that biased data collection is irrelevant. You have to either claim that continuing until 3 successes were achieved is an unbiased process, or retreat from the claim that that procedure for collecting the data does not influence the correct interpretation of the results.

Comment author: Cyan 28 May 2011 09:11:02PM *  5 points [-]

If Alice decides to conduct 12 trials, then the sampling distribution of the data is the binomial distribution. If Alice decides to sample until 3 successes are achieved, then the sampling distribution of the data is the negative binomial distribution. These two distributions are proportional when considered as functions of the parameter p (i.e., as likelihood functions). So in this specific case, from a Bayesian point of view the sampling mechanism does not influence the conclusions. (This is in contradistinction to inference based on p-values.)

In general, you are correct to say that biased data collection is not irrelevant; this idea is given a complete treatment in Chapter 6 (or 7, I forget which) of Gelman et al.'s Bayesian Data Analyses, 2nd ed.