Kindly comments on Open Problems Related to Solomonoff Induction - Less Wrong

27 Post author: Wei_Dai 06 June 2012 12:26AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (102)

You are viewing a single comment's thread. Show more comments above.

Comment author: Kindly 14 January 2015 02:52:19AM *  1 point [-]

The hypotheses are encoded in a prefix-free way: no hypothesis is a prefix of another hypothesis.

In particular, this eliminates your example: if both strings of length 1 ("0" and "1") are valid hypotheses, then no string of longer length can be a valid hypothesis (then we can take Omega=1 and assign a probability of 1/2 to each). Or we could have "0", "10", and "110" be valid hypotheses; then we can take Omega = 7/8 and assign probabilities of 4/7, 2/7 and 1/7 to these.

In general, the prefix-free condition ensures that the sum over all hypotheses converges, by Kraft's inequality, which is the real heavy lifter here; the constant is merely there to make the sum over all hypotheses converge to 1 instead.

You might imagine that hypotheses are required to be grammatically correct English sentences (I guess encoded in ASCII or something). In that case, no hypothesis is a prefix of another, because each hypothesis ends with a period.

Comment author: 3p1cd3m0n 15 January 2015 01:35:16AM 0 points [-]

Right; I forgot that it used a prefix-free encoding. Apologies if the answer to this is painfully obvious, but does having a prefix-free encoding entail that there is a finite number of possible hypotheses?

Comment author: Kindly 15 January 2015 02:21:36AM 0 points [-]

It doesn't: for example, the set of strings 0, 10, 110, 1110, 11110, 111110, ... is infinite and prefix-free. So is the set of C-style (null-terminated) strings.

Comment author: 3p1cd3m0n 16 January 2015 03:50:52AM 0 points [-]

I see. Does the method of normalization you gave work even when there is an infinite number of hypotheses?

Comment author: Kindly 16 January 2015 03:58:50AM 0 points [-]

It does. The infinite sum will always converge, so by normalizing we can make it converge to 1.

Take gjm's approach to explaining the normalization, in which the initial weight of 2^-length assigned to a hypothesis is the probability that you obtain that hypothesis by choosing bits at random. Then the normalization is just a conditional probability: you divide by the probability that, when choosing bits at random, you do eventually end up hitting a hypothesis.

Comment author: gjm 15 January 2015 01:29:52PM 0 points [-]

Remark: these are special cases of the schema "all strings that contain exactly one instance of pattern X in possible positions Y", where in the first case X is "0" and Y is "anywhere" and in the second X is "00000000" and Y is "any position that's a multiple of 8 bits".

(Of course there are plenty of other prefix-free sets of bitstrings. For instance: interpret blocks of 8 bits as characters as in Kindly's second example, and say that a valid program is anything that begins with a "(", has properly matched parentheses, and ends with the ")" matching the first "(". This doesn't have the convenient property that every bitstring begins with a valid program; for examples that do, take the exponential Golomb coding or the Elias omega coding of natural numbers.