Maybe we're using different definitions of SI? The version that I'm thinking of (which I believe is the original version) quantifies over all possible deterministic programs that output a sequence of bits, using an arbitrary encoding of programs. Then it sums up the weights of all programs that are consistent with the inputs seen so far, and outputs a probability distribution. That turns out to be equivalent to a weighted mixture of all possible computable distributions, though the proof isn't obvious. If we need to adapt this version of SI to output a single guess at each step, we just take the guess with the highest probability.
Does that sound right? If yes, then this page from Li and Vitanyi's book basically says that SI's probability of the next bit being 1 converges to 5/6 with probability 1, unless I'm missing something.
That's not the same definition I was using.
I said the programs have a single output, instead of a probability distribution. It matches the sequence so far or it doesn't, maybe programs are 100% eliminated or 100% not eliminated. The probabilistic nature comes entirely from the summing-over-all-programs part.
If programs can output a probability distribution then clearly the program "return {0:1/6, 1:5/6}" will do very well and cause the results to converge appropriately.
So, I've been hearing a lot about the awesomeness of Solomonoff induction, at least as a theoretical framework. However, my admittedly limited understanding of Solomonoff induction suggests that it would form an epicly bad hypothesis if given a random string. So my question is, if I misunderstood, how does it deal with randomness? And if I understood correctly, isn't this a rather large gap?
Edit: Thanks for all the comments! My new understanding is that Solomonoff induction is able to understand that it is dealing with a random string (because it finds itself giving equal weight to programs that output a 1 or a 0 for the next bit), but despite that is designed to keep looking for a pattern forever. While this is a horrible use of resources, the SI is a theoretical framework that has infinite resources, so that's a meaningless criticism. Overall this seems acceptable, though if you want to actually implement a SI you'll need to instruct it on giving up. Furthermore, the SI will not include randomness as a distinct function in its hypothesis, which could lead to improper weighting of priors, but will still have good predictive power -- and considering that Solomonoff induction was only meant for computable functions, this is a pretty good result.