You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

passive_fist comments on Open thread, Dec. 29, 2014 - Jan 04, 2015 - Less Wrong Discussion

4 Post author: MrMind 29 December 2014 11:10AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (164)

You are viewing a single comment's thread. Show more comments above.

Comment author: passive_fist 30 December 2014 01:19:41AM 4 points [-]

Adding to Vulture's reply (that you can not make absolute positive statements about truth), the modern view of "Occam's razor" (at least in Bayesian thought) is the minimum description length (MDL) principle (http://en.wikipedia.org/wiki/Minimum_description_length), which can be rigorously formalized. In this formalism, it becomes a prior over models. Multiplied with the likelihood over models (derived from data), this gives you a posterior. In this posterior, if you have two models that make exactly the same predictions, the simpler one is preferred (note that the more complicated one isn't completely rejected; it's just given lower posterior probability).

There are very deep fundamental theoretical considerations for why MDL is a very good way of assigning a prior to beliefs. If someone wants to reject Occam's razor, they would have to give an alternative system and show that under the assumptions of MDL it gives better long-term utility. Or that the assumptions of MDL are unfounded.

Comment author: gedymin 31 December 2014 10:22:08AM 0 points [-]

Perhaps you can comment this opinion that "simpler models are always more likely" is false: http://www2.denizyuret.com/ref/domingos/www.cs.washington.edu/homes/pedrod/papers/dmkd99.pdf

Comment author: passive_fist 31 December 2014 09:56:13PM 0 points [-]

That paper doesn't seem to be arguing against Occam's razor. Rather it seems to be making the more specific point that model complexity on training data doesn't necessarily mean worse generalization error. I didn't read through the whole article so I can't say if the arguments make sense, but it seems that if you follow the procedure of updating your posteriors as new data arrives, the point is moot. Besides, the complexity prior framework doesn't make that claim at all.