alex_zag_al comments on Beautiful Probability - Less Wrong

34 Post author: Eliezer_Yudkowsky 14 January 2008 07:19AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (109)

Sort By: Old

You are viewing a single comment's thread. Show more comments above.

Comment author: Elver 14 January 2008 10:12:43AM 2 points [-]

Something popped into my mind while I was reading about the example in the very beginning. What about research that goes out to prove one thing, but discovers something else?

Group of scientists want to see if there's a link between the consumption of Coca-Cola and stomach cancer. They put together a huge questionnaire full of dozens of questions and have 1000 people fill it out. Looking at the data they discover that there is no correlation between Coca-Cola drinking and stomach cancer, but there is a correlation between excessive sneezing and having large ears.

So now we have a group of scientists who set out to test correlation A, but found correlation B in the data instead. Should they publish a paper about correlation B?

Comment author: alex_zag_al 10 December 2012 03:46:48PM *  1 point [-]

I have no idea about what's done in actual statistical practice, but it seems to make sense to do this:

Publish the likelihood ratio for each correlation. The likelihood ratio for the <excessive sneezing-large ears> correlation being real and replicable will be very high.

Since they bothered to do the test, you can figure that people in the know have decently sized prior odds for the <Coca Cola-stomach cancer> association being real and replicable. There must have been animal studies or a biochemical argument or something. Consequently, a high likelihood ratio for this hypothesis may been enough to convinced them - that is, when it's multiplied with the prior, the resulting posterior may have been high enough to represent the "I'm convinced" state of knowledge.

But the prior odds for the <excessive sneezing-large ears> correlation being real and replicable are the same tiny prior odds you would have for any equally unsupported correlation. When they combine the likelihood ratio with their prior odds they do end up with a much higher posterior odds for <excessive sneezing-large ears> than they do for other arbitrary-seeming correlations. But, still insignificant.

The critical thing that distinguishes the two hypotheses is whatever previous evidence led them to attempt the test; that's why the prior for the <Coca Cola-stomach cancer> association is higher. It's subjective only in the sense that it depends on what you've already seen - it doesn't depend on your thoughts. Whereas, in what Kindly says is the standard solution, you apply a different test depending upon what the researcher's intentions were.

(I have no idea how you would calculate the prior odds. I mean, Solomonoff induction with your previous observations is the Carnot engine for doing it, but I have no idea how you would actually do it in practice)