Vaniver comments on Open thread, Jan. 19 - Jan. 25, 2015 - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (302)
General question: I've read somewhere that there's a Bayesian approach to at least partially justifying simplicity arguments / Occam's Razor. Where can I find a good accessible explanation of this?
Specifically: Say you're presented with a body of evidence and you come up with two sets of explanations for that evidence. Explanation Set A consists of one or two elegant principles that explain the entire body of evidence nicely. Explanation Set B consists of hundreds of separate explanations, each one of which only explains a small part of the evidence. Assuming your priors for each individual explanation is about equal, is there a Bayesian explanation for our intuition that we should bet on Explanation Set A?
What about if your prior for each individual explanation in Set B is higher than the priors for the explanations in Set A?
Example:
Say you're discussing Bible Criticism with a religious friend who believes in the traditional notion of complete Mosaic authorship but who is at least somewhat open to alternatives. To your friend, the priors for Mosaic authorship are much higher than the priors for a documentary or fragmentary hypothesis. (If you want numbers, say that your friend's priors are .95 in favor of Mosaic authorship.)
Now you present the arguments, many of which (if I understand them correctly) boil down to simplicity arguments:
The question is, is your friend justified in rejecting your simplicity-based arguments based on his high priors? What about if his priors were lower, say .6 in favor of Mosaic authorship? What about if he held 50-50 priors?
I think you'll get somewhere by searching for the phrase "complexity penalty." The idea is that we have a prior probability for any explanation that depends on how many terms / free parameters are in the explanation. For your particular example, I think you need to argue that their prior probability should be different than it is.
I think it's easier to give a 'frequentist' explanation of why this makes sense, though, by looking at overfitting. If you look at the uncertainty in the parameter estimates, they roughly depend on the number of sample points per parameter. Thus the fewer parameters in a model, the more we think each of those parameters will generalize. One way to think about this is the more free parameters you have in a model, the more explanatory power you get "for free," and so we need to penalize the model to account for that. Consider the Akaike information criterion and Bayesian information criterion.