saturn comments on Open Thread June 2010, Part 4 - Less Wrong

5 Post author: Will_Newsome 19 June 2010 04:34AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (325)

You are viewing a single comment's thread. Show more comments above.

Comment author: cousin_it 30 June 2010 07:36:04PM *  2 points [-]

I guess everyone here already understands this stuff, but I'll still try to summarize why "model checking" is an argument against "naive Bayesians" like Eliezer's OB persona. Shalizi has written about this at length on his blog and elsewhere, as has Gelman, but maybe I can make the argument a little clearer for novices.

Imagine you have a prior, then some data comes in, you update and obtain a posterior that overwhelmingly supports one hypothesis. The Bayesian is supposed to say "done" at this point. But we're actually not done. We have only "used all the information available in the sample" in the Bayesian sense, but not in the colloquial sense!

See, after locating the hypothesis, we can run some simple statistical checks on the hypothesis and the data to see if our prior was wrong. For example, plot the data as a histogram, and plot the hypothesis as another histogram, and if there's a lot of data and the two histograms are wildly different, we know almost for certain that the prior was wrong. As a responsible scientist, I'd do this kind of check. The catch is, a perfect Bayesian wouldn't. The question is, why?

Comment author: saturn 30 June 2010 09:35:33PM 2 points [-]

This sounds like a confusion between a theoretical perfect Bayesian and practical approximations. The perfect Bayesian wouldn't have any use for model checking because from the start it always considers every hypothesis it is capable of formulating, whereas the prior used by a human scientist won't ever even come close to encoding all of their knowledge.

(A more "Bayesian" alternative to model checking is to have an explicit "none of the above" hypothesis as part of your prior.)

Comment author: CarlShulman 01 July 2010 11:49:20PM 1 point [-]

NOTA is addressed in the paper as inadequate. What does it predict?

Comment author: Cyan 02 July 2010 03:07:39PM 0 points [-]

See here.

Comment author: cousin_it 01 July 2010 10:36:24AM *  1 point [-]

(A more "Bayesian" alternative to model checking is to have an explicit "none of the above" hypothesis as part of your prior.)

I don't see how that's possible. How do you compute the likelihood of the NOTA hypothesis given the data?

Comment author: Cyan 02 July 2010 03:04:36PM *  2 points [-]

NOTA is not well-specified in the general case, but in at least one specific case it's been done. Jaynes's student Larry Bretthorst made a useable NOTA hypothesis in a simplified version of a radar target identification problem (link to a pdf of the doc).

(Somewhat bizarrely, the same sort of approach could probably be made to work in certain problems in proteomics in which the data-generating process shares the key features of the data-generating process in Bretthorst's simplified problem.)

Comment author: cousin_it 02 July 2010 04:30:49PM *  0 points [-]

If I'm not mistaken, such problems would contain some enumerated hypotheses - point peaks in a well-defined parameter space - and the NOTA hypothesis would be a uniformly thin layer over the rest of that space. Can't tell what key features the data-generating process must have, though. Or am I failing reading comprehension again?

Comment author: Cyan 02 July 2010 08:24:57PM *  0 points [-]

If I'm not mistaken, such problems would contain some enumerated hypotheses - point peaks in a well-defined parameter space - and the NOTA hypothesis would be a uniformly thin layer over the rest of that space

Yep.

Can't tell what key features the data-generating process must have, though.

I think the key features that make the NOTA hypothesis feasible are (i) all possible hypotheses generate signals of a known form (but with free parameters), and (ii) although the space of all possible hypotheses is too large to enumerate, we have a partial library of "interesting" hypotheses of particularly high prior probability for which the generated signals are known even more specifically than in the general case.