I am confused about Somolonoff induction and some other related concepts. I appreciate some help from anyone who understands this.
What I understand so far is:
- I understand Bayesian statistics and the problem of choosing the priors
- I understand that Kolmogorov complexity can be used to measure the complexity of an hypothesis
- I think to understand that Somolonoff induction consists of evaluating all the possible hypotheses generating some observed data, and weighting them according to their complextiy
For example, the sequence HTHTT could be generated by different programs:
- A program that flips a fair coin
- A program that flips a coin with p=0.4
- A program that generates "exactly" the sequence HTHTT
- ... etc. Infinite rules could be written here
I do understand that a program that flips a fair coin is simpler than a program that generates the exact output
What I don't understand is:
How do you weight exactly complexity against likelihood?
In the previous example, we could say that the first program has complexity X bits while the the third program has complexity Y, bein Y > X (but only a small difference, because we don't have that many observations). Here is where things get a bit murky for me. The "predictive" power of the first program to explain that specific output, being probabilistic, is 1/2^5, while for the second program the likelihood is 1. To prefer the first hypothesis we have to assign a prior probability at least 2^5 times higher to the first program than to the second one. Where does this 2^5 factor comes from?
I know that it is a theoretical thing but if you can use coin flips in the explanation, that helps a lot. If you identify other places where I am confused but mistakenly thought I wasn't, please point that out.
I think this is pointing to what I don't understand: how do you account for hypotheses that explain data generated randomly? How do you compare a hypothesis which is a random number generator with some parameters against a hypothesis which has some deterministic component?
Is there a way to understand this without reading the original paper (which will probably take me quite long)?
When you understood this, how was your personal process that took you from knowing about probabilities and likelihood to understanding Solomonoff induction? Did you have to read the original sources or you found some good explanations somewhere?
I also don't get if this is a calculation that you can do in a single step or if this is a continuous thing. In other words, Solomonoff induction would work only if we assume that we keep observing new data?
Sorry for the stupid questions, as you can see, I am confused.