Posts

Sorted by New

Wiki Contributions

Comments

Yes, there is a connection between "the length of the shortest program that generates exactly the message coming from the environment" (the Kolmogorov Complexity of this message ) and "unknown but computable probabililty distribution".

"unknown but computable" means there is a program, but we don't know anything about it, including its size. There is an infinite number of possible programs, and an infinite number of lengths of such programs. But we know that the sum of all the probabilities from "the probability distribution" of all programs that could generate the environment must be exactly equal to one, as per Math's definition of probability.

Let me quote Shane Legg about it:

"Over a denumerable (infinite but countable) space you mathematically can’t put a non-zero uniform prior because it will always sum to more than one. Thus, some elements have to have higher prior probability than others. Now if you order these elements according to some complexity measure (any total ordering will do, so the choice of complexity function makes no difference), then the boundedness of the probability sum means that the supremium of the sequence converges to zero. Thus, for very general learning systems, and for any complexity measure you like, you can only violate Occam’s razor locally. Eventually, as things get more complex, their probabilities must decrease.

In summary: if you want to do pure Bayesian learning over very large spaces you must set a prior that globally respects a kind of Occam’s razor. It’s not a matter of belief, it’s a mathematical necessity." http://www.vetta.org/2011/06/treatise-on-universal-induction/comment-page-1/#comment-21408

Solomonoff's only assumption is that the environment follows some unknown but computable probability distribution. Everything else is a mathematically proven theorem which come from this only assumption, and the traditional axioms and definitions of Math.

So if for you "the map" means "follows some unknown computable probability distribution" then you are right in the sense that if you disagree with this assumption, the theorem is not valid anymore.

But a lot of scientists do make the assumption that there exists at least one useful computable theory and that their job is to find it.

Kilobug wrote "There is no updating of probabilities, because hypothesis are always right or wrong" Do not forget that any wrong hypothesis can become right by adding some error correcting instructions in it. It will only make the program longer and thus lower its probability. But is seems intuitive that the more a theory needs error corrections, the less it's probable.

Kilobug wrote "there is no room for an hypothesis like "the coin will fall heads 75% of times, tails 25% of time" There is room for both an hypothesis that predict the coin will always fall heads and this hypothesis has a probability of 75%. And an hypothesis that predict the coin will always fall tails and this hypothesis has a probability of 25%.