Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Sewing-Machine comments on Occam's Razor - Less Wrong

37 Post author: Eliezer_Yudkowsky 26 September 2007 06:36AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (52)

Sort By: Old

You are viewing a single comment's thread. Show more comments above.

Comment author: [deleted] 20 March 2011 04:38:46PM 0 points [-]

Is there a nice way to quantify how fast does these theoretical priors drop off with the length of something? By how much should I favor simple explanation X over only mediumly more complicated explanation Y.

Comment author: cousin_it 20 March 2011 05:49:04PM *  4 points [-]

Interesting question. If you have a countable infinity of mutually exclusive explanations (e.g. they are all finite strings using letters from some finite alphabet), then your only constraint is that the infinite sum of all their prior probabilities must converge to 1. Otherwise you're free to choose. You could make the convergence really fast (say, by making the prior of a hypothesis inversely proportional to the exponent of the exponent of its length), or slower if you wish to. A very natural and popular choice is restricting the hypotheses to form a "prefix-free set" (no hypothesis can begin with another shorter hypothesis) and then assigning every hypothesis of N bits a prior of 2^-N, which makes the sum converge by Kraft's inequality.

Comment author: CuSithBell 20 March 2011 05:59:28PM 1 point [-]

What is the reasoning behind using a prefix-free set?

Comment author: cousin_it 20 March 2011 06:09:23PM *  3 points [-]

Apart from giving a simple formula for the prior, it comes in handy in other theoretical constructions. For example, if you have a "universal Turing machine" (a computer than can execute arbitrary programs) and feed it an infinite input stream of bits, perhaps coming from a random source because you intend to "execute a random program"... then it needs to know where the program ends. You could introduce an end-of-program marker, but a more general solution is to make valid programs form a prefix-free set, so that when the machine has finished reading a valid program, it knows that reading more bits won't result in a longer but still valid program. (Note that adding an end-of-program marker is one of the ways to make your set of programs prefix-free!)

Overall this is a nice example of an idea that "just smells good" to a mathematician's intuition.

Comment author: CuSithBell 20 March 2011 06:21:32PM *  0 points [-]

Ah! I must have had a brain-stnank - this makes total sense in math / theoretical CS terms, I was substituting an incorrect interpretation of "hypothesis" when reading the comment out of context. Thanks :)