So, I've been hearing a lot about the awesomeness of Solomonoff induction, at least as a theoretical framework. However, my admittedly limited understanding of Solomonoff induction suggests that it would form an epicly bad hypothesis if given a random string. So my question is, if I misunderstood, how does it deal with randomness? And if I understood correctly, isn't this a rather large gap?
Edit: Thanks for all the comments! My new understanding is that Solomonoff induction is able to understand that it is dealing with a random string (because it finds itself giving equal weight to programs that output a 1 or a 0 for the next bit), but despite that is designed to keep looking for a pattern forever. While this is a horrible use of resources, the SI is a theoretical framework that has infinite resources, so that's a meaningless criticism. Overall this seems acceptable, though if you want to actually implement a SI you'll need to instruct it on giving up. Furthermore, the SI will not include randomness as a distinct function in its hypothesis, which could lead to improper weighting of priors, but will still have good predictive power -- and considering that Solomonoff induction was only meant for computable functions, this is a pretty good result.
I'm saying that SI's probability of next bit being 1 doesn't converge to 5/6, for some encodings and (very importantly) using single outputs instead of probability distributions.
For example, suppose we take a program encoding where it does converge to 5/6 then expand it so every "0" becomes a "00" and every "1" becomes a "01". Then we add rules to the encoding such that, for every "10" or "11" starting at an even position (so the original re-coding would have none), an extra zero is appended to the program's output just before halting.
Suppose that, after 100 values in the sequence, our new encoding had somehow converged on 5/6 chance of "1". What happens when we extend the shortest satisfying programs by two more characters? Well, for each shortest program, putting a "01" or "00" at the end will give the 5/6 proportionality we want. But putting them elsewhere will likely break the intermediate sequence, so those programs will be eliminated and not count. But most importantly, all those places we can't put a "01" or "00" will take a "10" or "11" without breaking the sequence so far. So there's 1 place we can put a new character to get 5/6 as desired, and (about) sixty five places to put the dumb "then append 0" characters.
So starting from the 5/6 we're supposed to converge to, we diverge to ~1/78. And this is a problem with the encoding that making the programs longer doesn't fix. In fact, it makes it worse as there are proportionally more and more places to put a "10" and "11". Even though all of those programs with dumb characters are getting eliminated anytime a "1" is output, the space of programs is so infested with them that the probability still converges to 0 instead of 5/6.
It seems to me that the arithmetic decoding programs you mention in your first comment churn ad nauseam on their infinite compressed stream. So they don't halt and the instructions "10" and "11" won't matter. SI picks from a space of infinite programs, so the instructions can't wait until the end of the stream either.
What can happen, closest to the skew you mention I can think of, is that a program can contain code to stop arithmetic decoding after the first 100 values and output zeros from then on. This code carries a penalty which inc... (read more)