Open thread, July 29-August 4, 2013

David_Gerard

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

Of course, for "every Monday", the last one should have been dated July 22-28. *cough*

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

Of course, for "every Monday", the last one should have been dated July 22-28. *cough*

I believe I've encountered a problem with either Solomonoff induction or my understanding of Solomonoff induction. I can't post about it in Discussion, as I have less than 20 karma, and the stupid questions thread is very full (I'm not even sure if it would belong there).

I've read about SI repeatedly over the last year or so, and I think I have a fairly good understanding of it. Good enough to at least follow along with informal reasoning about it, at least. Recently I was reading Rathmanner and Hutter's paper, and Legg's paper, due to renewed interest in AIXI as the theoretical "best intelligence," and the Arcade Learning Environment used to test the computable Monte Carlo AIXI approximation. Then this problem came to me.

Solomonoff Induction uses the size of the description of the smallest Turing machine to output a given bitstring. I saw this as a problem. Say AIXI was reasoning about a fair coin. It would guess before each flip whether it would come up heads or tails. Because Turing machines are deterministic, AIXI cannot make hypotheses involving randomness. To model the fair coin, AIXI would come up with increasingly convoluted Turing machines, attempting to compress a bitstring that approaches Kolmogorov randomness as its length approaches infinity. Meanwhile, AIXI would be punished and rewarded randomly. This is not a satisfactory conclusion for a theoretical "best intelligence." So is the italicized statement a valid issue? An AI that can't delay reasoning about a problem by at least labeling it "sufficiently random, solve later" doesn't seem like a good AI, particularly in the real world where chance plays a significant part.

Naturally, Eliezer has already thought of this, and wrote about it in Occam's Razor:

The formalism of Solomonoff Induction measures the "complexity of a description" by the length of the shortest computer program which produces that description as an output. To talk about the "shortest computer program" that does something, you need to specify a space of computer programs, which requires a language and interpreter. Solomonoff Induction uses Turing machines, or rather, bitstrings that specify Turing machines. What if you don't like Turing machines? Then there's only a constant complexity penalty to design your own Universal Turing Machine that interprets whatever code you give it in whatever programming language you like. Different inductive formalisms are penalized by a worst-case constant factor relative to each other, corresponding to the size of a universal interpreter for that formalism.

In the better (IMHO) versions of Solomonoff Induction, the computer program does not produce a deterministic prediction, but assigns probabilities to strings. For example, we could write a program to explain a fair coin by writing a program that assigns equal probabilities to all 2^N strings of length N. This is Solomonoff Induction's approach to fitting the observed data. The higher the probability a program assigns to the observed data, the better that program fits the data. And probabilities must sum to 1, so for a program to better "fit" one possibility, it must steal probability mass from some other possibility which will then "fit" much more poorly. There is no superfair coin that assigns 100% probability to heads and 100% probability to tails.

Does this warrant further discussion, if at least to validate or refute this claim? I don't think Eliezer's proposal for a version of SI that assigns probabilities to strings is strong enough, it doesn't describe what form the hypotheses would take. Would hypotheses in this new description be universal nondeterministic Turing machines, with the aforementioned probability distribution summed over the nondeterministic outputs?

I don't think anyone has pointed you at quite the right direction for getting a fully satisfactory answer to your question. I think what you're looking for is the concept of Continuous Universal A Priori Probability:

The universal distribution m is defined in a discrete domain, its arguments are finite binary strings. For applications such as the prediction of growing sequences it is necessary to define a similar distribution on infinite binary sequences. This leads to the universal semi-measure M defined as the probability that the output of a monotone u

... (read more)

0passive_fist13y

Qiaochu has already answered your question about SI, but to also attack your question about AIXI: Careful about what you're assuming. You're implicitly assuming that the AI doesn't know that what is being flipped is a random coin. If the AI had that knowledge, it could just replace all those convoluted descriptions with just a simple one: "Generate a pseudorandom number". This would be just as effective as any other predictor, and indeed it would be very short and easy to run. Now, what if the AI doesn't know this? Then you are feeding it random numbers and expecting it to find order in them. In other words, you're asking the hardest problem of all. It makes sense that it would expend a huge amount of computational power trying to find some order in random numbers. Put yourself in the computer's place. How on Earth would you ever be able to know if the string of 0's and 1's you are being presented with is really just random or the result of some incredibly complicated computer program? No one's telling you! Finally, if the coin is actually a real physical coin, the computer will keep trying more and more complicated hypotheses until it has modelled your fingers, the fluid dynamics of the air, and the structure of the ground. Once it has done so, it will indeed be able to predict the outcome of the coin flip with accuracy. Note that the optimality of AIXI is subject to several important gotchas. It is a general problem solver, and can do better than any other general problem solver, but there's no guarantee that it will do better than specific problem solvers on certain problems. This is because a specifically-designed problem solver carries problem-specific information with it - information that AIXI may not have access to. Even a very small amount of information (say, a few tens of bits) about a problem can greatly reduce the search space. Just 14 bits of information (two ASCII characters) can reduce the search space by a factor of 2^14 = 16384.

0Richard_Kennaway13y

It seems to completely answer your question. That is, one can think about probabilities and formulate and test probabilistic hypotheses, without needing to generate any random numbers.

7

Open thread, July 29-August 4, 2013

7

7

7

Open thread, July 29-August 4, 2013

7

7