MrMind comments on Understanding and justifying Solomonoff induction - Less Wrong

1 Post author: gedymin 15 January 2014 01:16AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (75)

You are viewing a single comment's thread. Show more comments above.

Comment author: MrMind 15 January 2014 10:10:11AM 0 points [-]

Some random thoughts:

A. But when we're calculating the complexity of hypotheses, we're doing it in respect to a particular model of computing!

Yes, but every other model of universal computation gives the same results, besides for a single constant (the cost of intercomputation between the universal models).

But who says it's a universal machine! It may be anything, really! For example, who says that the universe doesn't work as a quasi-Turing machine that outputs a huge number of "1" with 99% probability every time after it outputs a single "0"? The Universal Prior relies on the relation between inputs and outputs of the machine; if this relation is changed, the probabilities are going to be wrong.

What you're describing here is a non-deterministic Turing machine. They have been proven equivalent to deterministic ones.

On the other hand, the physical model may be strictly more powerful than a Universal Turing machine - it may be a hypercomputer! If it is, the universe is uncomputable.

Yes! But if it is, either we grab ahold of a source of hypercomputation a proceed from there (although '"solomonoff induction" & hypercomputation' gives very few relevant results from Google), or we are never going to know that it is: we might just suspect from our models becoming longer and longer, but we will never be able to prove that.

Also, as B said correctly, justification for Solomonoff Induction is epistemic: we are doing the best we can (i.e. using a universal finite computational model), eliminating all that we know is factually incorrect without throwing away nothing that is not explicitly forbidden by data.

Comment author: kokotajlod 15 January 2014 09:23:51PM *  2 points [-]

Yes, but every other model of universal computation gives the same results, besides for a single constant (the cost of intercomputation between the universal models)

I don't see how this solves the problem. The other models of universal computation give different results, though they are similar in certain respects. For particular problems, like "Will the gravitational constant change tomorrow?" it doesn't seem to help.

Consider the hypotheses A = [Laws of physics] and B = [Laws of physics + 10% increase in gravitational constant tomorrow]

Every model of computation that we have used so far makes A simpler than B. But there are infinitely many models that make B simpler than A. Why should we prefer A to B?

Maybe I'm just missing something. Various smart people have said that the relativity of complexity isn't a problem because of the constant-factor agreement between models/languages. However, various other smart people who have thought about it a lot have said the opposite, so I don't think I'm missing something obvious.

Maybe the answer has to do with a measure defined over the space of models? This would be nice, but I worry about just pushing the problem up a level.

Comment author: MrMind 16 January 2014 12:48:50AM *  -1 points [-]

I don't see how this solves the problem.

Well, first thing first: what problem?
My observation was that it doesn't matter which kind of universal computation we use, because, besides for a constant, all the models gives the same complexity. This means that one universal prior will differ from another just by a finite number of terms.

Why should we prefer A to B?

If A and B are the only explanations surviving, then by all means we shouldn't prefer A to B.
But the point is, in any situation where Solomonoff is used, you have also:
C = [Laws of physics + 10% increase in gravitational constant tomorrow + 10% decrease in two days]
D = [Laws of physics + 10% increase in gravitational constant tomorrow + 10% decrease in two days + 10% increase in three days]
E = [Laws of physics + 10% increase in gravitational constant tomorrow + 10% decrease in two days + 10% increase in three days + 10% decrease in four days]
and so on and so forth.

Let's say that to me, in order of probability, E > D > C > B > A, in a perfect anti-Occamian fashion. In any case, A + B + C + D + E, will have an amount of probability x. You still have 1-x to assign to all the other longer programs: F, G, H, etc.
However small the probability of A, there will be a program (say Z) for which all the other programs longer than Z will have probabilities lower than A. In this case, you have a finite violation of Occam's razor, but it is forced on you by math that A is to be preferred as an explanation to longer programs (that is, longer than Z).

Comment author: kokotajlod 16 January 2014 09:05:31PM 1 point [-]

The problem being discussed is the relativity of complexity. So long as anything can be made out to be more complicated than anything else by an appropriate choice of language, it seems that Solomonoff Induction will be arbitrary, and we won't be justified in thinking that it is accurate.

Yes, one universal prior will differ from another by just a finite number of terms. But there is no upper bound on how large this finite number can be. So we can't make any claims about how likely specific predictions are, without arbitrarily ruling out infinite sets of languages/models. So the problem remains.

As you say, A is to be preferred to programs longer than Z. But there is no upper bound on how long Z might be. So any particular program--for example, B--is such that we have no reason to say that it is more or less likely than A. So it seems we have failed to find a full justification for why we should prefer A to B.

Unless, as I said, we start talking about the space of all possible languages/models. But as I said, this threatens to just push the problem up a level.