You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

paulfchristiano comments on Solomonoff Induction, by Shane Legg - Less Wrong Discussion

13 Post author: cousin_it 21 February 2011 12:32AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (7)

You are viewing a single comment's thread.

Comment author: paulfchristiano 21 February 2011 02:59:39AM *  2 points [-]

There are straightforward proofs of this fact, I think. For example, observe that Solomonoff induction does only a constant worse than multiplicative weights over the space of computable predictors. Thinking about it this way seems more productive, unless I am missing something important.

Comment author: cousin_it 21 February 2011 08:31:32AM *  0 points [-]

The tricky part for me was proving the equivalence of two views of the Solomonoff prior: as a probability distribution over programs that print bitstrings and never make mistakes, and as a mixture of all "lower semi-computable" probability distributions over finite prefixes of bitstrings that are allowed to make mistakes. Once you derive the second view from the first, the conclusion is indeed trivial.

Comment author: paulfchristiano 21 February 2011 08:05:33PM 0 points [-]

I started from the view of multiplicative weights over computable predictors, and then when I tried to describe it as an improvement on Solomonoff induction realized it was the same thing. If you write down what the two algorithms actually do, the equivalence to within a constant is clear.

Comment author: Wei_Dai 21 February 2011 09:18:59AM *  0 points [-]

I mentioned to you a few days ago (in chat) that Scholarpedia stated:

The universal a priori probability m has many remarkable properties. A theorem of L.A. Levin states that among the lower semi-computable semi-measures it is the largest such function in the following sense: If p is a lower semi-computable semi-measure, then there is a constant c such that cm(x) >= p(x) for all x.

When you say "The tricky part for me was proving the equivalence of two views of the Solomonoff prior" do you mean you tried to re-derive this result?

Comment author: AlephNeil 22 February 2011 01:12:48AM *  0 points [-]

For the discrete version, doesn't this boil down to converting a program that semi-computes a semi-measure into a program that "semi-samples from" a semi-measure?

Which is pretty straightforward:

We interpret the remainder of our (infinite) input as a real number y in [0,1]. Every time the measure that we're semi-computing adds an amount p of probability to one of its possible values v, we increment a variable x (which was initialised to 0) by p. If and when x finally exceeds y, we return the value v whose probability was just incremented.

(The continuous version, which is the one we actually need, looks harder, but I guess the same general idea should work. Perhaps we just repeat the algorithm above once for every bit of the output?)

Comment author: cousin_it 21 February 2011 12:36:41PM *  0 points [-]

Yes. I eventually succeeded, but not without a tiny bit of help from the texts.