Solomonoff induction

Discuss the wikitag on this page. Here is the place to ask questions and propose changes.

New Comment

9 comments, sorted by

Suppose you have a program P that outputs a probability distribution over the next possible bit. With a fairly small constant amount of extra code, you can make an algorithm P' that takes in an arbitrary bitstream and decompresses them according to P. In other words, there are data compression algorithms that can use the probabilities to compress the environmental sequence. Now consider the set of programs that look like P'(b) for some bitstring b. The number of extra bits needed to encode b is the same as the number of bits P would loose by not being able to predict every step perfectly. In other words the set of all P'(b) deterministic programs combine to give basically the same probability distribution as P. (up to constant fiddle factors that depend on choice of TM)

In short, you can compress the random noise in the environment, and put it in your predictive program. This gives a deterministic program. And this costs only a constant more bits than admitting your uncertainty.

[-]Jacob Thoennes7y*10

"diffraction" should be "refraction" throughout

[-]Gurkenglas8y*20

The Solomonoff prior is not computed from the space each hypothesis needs to compute its answer, it's its program length, the space needed to pass the hypothesis to our UTM!

Any other form of induction one could try, like the space or time complexities of evaluating a hypothesis you described, or other machine models that Turing machines can describe, or human intuition, is included by Solomonoff induction as a hypothesis, and will replace it after finite error if it outperforms.

How would you even compute each program's space/time complexity? Rice's theorem says it cannot be done in general.

[-]Eric B9y*10

Link to a description of the library of babel?

[-]Zack_M_Davis9y*30

I imagine Ashley Watson leaning against the fourth wall as she delivers this line with an air of knowing satisfaction at how well she's fulfilling the purpose of her hypothetical existence.

[-]M Yass9y*30

Do we have(or need) any empirical evidence that algorithmic simplicity (space) is the ideal and ultimate absolute prior? If I reread the article carefully I see it doesn't quite seem to advocate this, but I think it's very easy for a learner to pick up that misconception from the way AIXI is generally discussed, the assumption that we somehow know that space complexity is the final answer, and I wonder if it should be ruled out or qualified here.

(I believe it's easy to pick up that misconception because it happened to me. I later came to realize I probably wouldn't bet at good odds that Space AIXI wont ever be dominated by a variant AIXI made with some other razor, the speed of generating environment turing machines, for instance, instead of space. Or how about inverse cyclomatic complexity or some more general measure of algorithm quality that humans thus far have lacked the breadth of mind to find, test or work with mathematically? Or maybe even just some other space complexity of some other machine model? Space complexity of TMs seems like extremely low-hanging fruit.)

I'm hoping to hear someone's done the work of compiling some ridiculously extensive dataset of algorithms and their competitors and demonstrating that space complexity seemed more predictive of domination than any of the other general metrics we could think of. If this result had been found, nobody ever seems to cite it. Though I suppose we might not hear about it when so many people find it completely intuitive.

[-]alexei9y*10

Looks like there is a typo with the fraction. sense 1...n . 1?

[-]So8res9y*20

I tweaked this to make it a bit clearer while also cleaning up the latex (I think it's standard to use mathrm on multi-letter variable names, to disambiguate them from the multiplication of many single variables, etc. etc.), but I still think that parts of the equation (such as InterpretProb) need types and explanations. (I think you also want to explain how it handles non-halting programs, which requires renormalization in the Solomonoff induction case, unless you want to talk about the semimeasure instead.)

[-]Eliezer Yudkowsky9y*60

I was trying to say "append 1 to previous sequence". I guess it needs explanation.

Moderation Log