Complexity of the map vs Complexity of the territory
Solomonoff induction infers facts about the territory from the properties of the map. That is an ontological argument, and therefore not valid.
Complexity of the map vs Complexity of the territory
Solomonoff induction infers facts about the territory from the properties of the map. That is an ontological argument, and therefore not valid.
Solomonoff's only assumption is that the environment follows some unknown but computable probability distribution. Everything else is a mathematically proven theorem which come from this only assumption, and the traditional axioms and definitions of Math.
So if for you "the map" means "follows some unknown computable probability distribution" then you are right in the sense that if you disagree with this assumption, the theorem is not valid anymore.
But a lot of scientists do make the assumption that there exists at least one useful computable theory and that their job is to find it.
After reading it fully, I've another, deeper problem with this (still great) article : Bayes' theorem totally disappears at the end. Hypothesis that exactly match the observation bitwise have a fixed probability (2^-n where n is their length), and those which are off even by one bit is discarded. There is no updating of probabilities, because hypothesis are always right or wrong. There is no concept left of an hypothesis that'll predict the position of an electron using a probability distribution, and there is no room for an hypothesis like "the coin will fall heads 75% of times, tails 25% of time".
That's both a problem for the article itself (why even speak of Bayes' theorem, if at the end it doesn't matter ?) and to find the "truth" about the universe in a quantum world (even if you accept MWI, what you'll actually observe in the world you end up being in will still be random).
I understand that going into the full details on how to handle fuzzy hypothesis (like, algorithms who don't just output one result, but different results and the probability of each) would make the article even longer, but it would still be a good thing to address those issues somewhere, IMHO.
Kilobug wrote "There is no updating of probabilities, because hypothesis are always right or wrong" Do not forget that any wrong hypothesis can become right by adding some error correcting instructions in it. It will only make the program longer and thus lower its probability. But is seems intuitive that the more a theory needs error corrections, the less it's probable.
Kilobug wrote "there is no room for an hypothesis like "the coin will fall heads 75% of times, tails 25% of time" There is room for both an hypothesis that predict the coin will always fall heads and this hypothesis has a probability of 75%. And an hypothesis that predict the coin will always fall tails and this hypothesis has a probability of 25%.
The problem is the "minimum message lenght". This is a property of the map, wich tell us about nothing about the territory.
Well, unless there is a direct connection between message lenght and "unknown but computable probability distribution" ?
Yes, there is a connection between "the length of the shortest program that generates exactly the message coming from the environment" (the Kolmogorov Complexity of this message ) and "unknown but computable probabililty distribution".
"unknown but computable" means there is a program, but we don't know anything about it, including its size. There is an infinite number of possible programs, and an infinite number of lengths of such programs. But we know that the sum of all the probabilities from "the probability distribution" of all programs that could generate the environment must be exactly equal to one, as per Math's definition of probability.
Let me quote Shane Legg about it:
"Over a denumerable (infinite but countable) space you mathematically can’t put a non-zero uniform prior because it will always sum to more than one. Thus, some elements have to have higher prior probability than others. Now if you order these elements according to some complexity measure (any total ordering will do, so the choice of complexity function makes no difference), then the boundedness of the probability sum means that the supremium of the sequence converges to zero. Thus, for very general learning systems, and for any complexity measure you like, you can only violate Occam’s razor locally. Eventually, as things get more complex, their probabilities must decrease.
In summary: if you want to do pure Bayesian learning over very large spaces you must set a prior that globally respects a kind of Occam’s razor. It’s not a matter of belief, it’s a mathematical necessity." http://www.vetta.org/2011/06/treatise-on-universal-induction/comment-page-1/#comment-21408