cousin_it comments on More and Less than Solomonoff Induction - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (32)
Solomonoff induction can handle that.
Yeah, albeit the shortest hypothesis that works keeps growing in length, which is really really bad for being able to compute something similar in practice.
That depends on what you mean by "shortest hypothesis". You can view Solomonoff induction as a mixture of all deterministic programs or, equivalently, a mixture of all computable distributions. If a "hypothesis" is a deterministic program, it will keep growing in length when given random bits. On the other hand, if a "hypothesis" is a computable distribution, its length might well stay bounded and not very large.
Well, it'd be an utter mess where you can't separate the "data" bits (ones that are transformed into final probability distribution if we assume they were random) from the "code" bits (that do the transformation). Indeed given that the universe itself can run computations, all "data" bits are also code.
If a "hypothesis" is a computable distribution, it doesn't need to include the random bits. (A computable distribution is a program that accepts a bit string and computes successive approximations to the probability of that bit string.)
Got to be a little more than that, for the probabilities of different bit strings to add up to 1. The obvious way to do that without having to sum all "probabilities" and renormalize, is to map an infinitely long uniformly distributed bit string into the strings for which you're defining probabilities.
In any case you're straying well away from what S.I. is defined to be.
I guess I left out too many details. Let's say a "hypothesis" is a program that accepts a bit string S and prints a sequence of non-negative rational numbers. For each S, that sequence has to be non-decreasing and converge to a limit P(S). The sum of P(S) over all S has to be between 0 (exclusive) and 1 (inclusive). Such an object is called a "lower-semicomputable semimeasure". SI is a universal lower-semicomputable semimeasure, which means it dominates any other up to a multiplicative constant. See page 8 of this paper for more on which classes of (semi-)measures have or don't have a universal element.
Yes, SI is such a measure, but the individual programs within SI as usually defined are not. You aren't feeding the input data into programs in SI, you match program outputs to the input data.
I think you can have an equivalent definition of SI in terms of programs that consume input data and print rational numbers. Or at least, I don't see any serious obstacles to doing so. It will also be expensive to approximate, and you'll need to renormalize as you go (and do other things like converting all sequences of rationals to non-decreasing sequences etc.), but at least you won't have the problem that "hypotheses" grow without bound.
That might be useful in some context, yes. But you'll still have both the program which is longer but gives probability of 1 to observations so far, and a program that is shorter, but gives a much much smaller probability (on the order of 2^-L where L is length difference).
I think the hypotheses in physics correspond not to individual programs in SI, but to common prefixes. For example, QM would describe a program that would convert subsequent random bits into predicted observations with correct probability distribution. You could treat a prefix as a probability distribution over outputs - the probability distribution that results from pre-pending that prefix to random bits. The bonus is that very simple operations on the random bits correspond to high level mathematical functions that are otherwise difficult to write in any low level TM-like representation.