Dominik comments on An Intuitive Explanation of Solomonoff Induction - Less Wrong

53 Post author: Alex_Altair 11 July 2012 08:05AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (210)

You are viewing a single comment's thread. Show more comments above.

Comment author: [deleted] 11 July 2012 04:21:17AM 0 points [-]

The problem is the "minimum message lenght". This is a property of the map, wich tell us about nothing about the territory.

Well, unless there is a direct connection between message lenght and "unknown but computable probability distribution" ?

Comment author: razorcamDOTcom 11 July 2012 08:14:54AM *  0 points [-]

Yes, there is a connection between "the length of the shortest program that generates exactly the message coming from the environment" (the Kolmogorov Complexity of this message ) and "unknown but computable probabililty distribution".

"unknown but computable" means there is a program, but we don't know anything about it, including its size. There is an infinite number of possible programs, and an infinite number of lengths of such programs. But we know that the sum of all the probabilities from "the probability distribution" of all programs that could generate the environment must be exactly equal to one, as per Math's definition of probability.

Let me quote Shane Legg about it:

"Over a denumerable (infinite but countable) space you mathematically can’t put a non-zero uniform prior because it will always sum to more than one. Thus, some elements have to have higher prior probability than others. Now if you order these elements according to some complexity measure (any total ordering will do, so the choice of complexity function makes no difference), then the boundedness of the probability sum means that the supremium of the sequence converges to zero. Thus, for very general learning systems, and for any complexity measure you like, you can only violate Occam’s razor locally. Eventually, as things get more complex, their probabilities must decrease.

In summary: if you want to do pure Bayesian learning over very large spaces you must set a prior that globally respects a kind of Occam’s razor. It’s not a matter of belief, it’s a mathematical necessity." http://www.vetta.org/2011/06/treatise-on-universal-induction/comment-page-1/#comment-21408

Comment author: [deleted] 12 July 2012 06:20:55AM 0 points [-]

thanks for the elaboration and the link.