Solomonoff Induction, by Shane Legg

cousin_it

Shane Legg's text Solomonoff Induction has helped me a lot over the last few days. I was trying to iron out the kinks in my understanding of Eliezer's argument in this old thread:

Solomonoff's universal distribution isn't trying to assume the universe is computable. It's trying to beat any computable probability distribution you could put over the universe. In other words, if your attempt to calculate, to talk about and reason about the universe is computable - even if you are talking about systems that you think are uncomputable - then your ratiocinations appear somewhere in Solomonoff induction.

Eliezer's statement is correct (more precisely, a computable human cannot beat Solomonoff in accumulated log scores by more than a constant, even if the universe is uncomputable and loves the human), but understanding his purported proof is tricky. Legg's text doesn't give any direct answer to the question at hand, but all the technical details in there, like the difference between "measures" and "semimeasures", are really damn important if you want to work out the answer for yourself. I know mathematics has many areas where an "intuitive understanding" kinda sorta suffices. This is not one of those areas.

Shane Legg's text Solomonoff Induction has helped me a lot over the last few days. I was trying to iron out the kinks in my understanding of Eliezer's argument in this old thread:

Solomonoff's universal distribution isn't trying to assume the universe is computable. It's trying to beat any computable probability distribution you could put over the universe. In other words, if your attempt to calculate, to talk about and reason about the universe is computable - even if you are talking about systems that you think are uncomputable - then your ratiocinations appear somewhere in Solomonoff induction.

The tricky part for me was proving the equivalence of two views of the Solomonoff prior: as a probability distribution over programs that print bitstrings and never make mistakes, and as a mixture of all "lower semi-computable" probability distributions over finite prefixes of bitstrings that are allowed to make mistakes. Once you derive the second view from the first, the conclusion is indeed trivial.

I started from the view of multiplicative weights over computable predictors, and then when I tried to describe it as an improvement on Solomonoff induction realized it was the same thing. If you write down what the two algorithms actually do, the equivalence to within a constant is clear.

0Wei Dai15y

I mentioned to you a few days ago (in chat) that Scholarpedia stated: [...] When you say "The tricky part for me was proving the equivalence of two views of the Solomonoff prior" do you mean you tried to re-derive this result?

21

Solomonoff Induction, by Shane Legg

21

21

21

Solomonoff Induction, by Shane Legg

21

21