Many thanks to Unknowns for inventing the scenario that led to this post, and to Wei Dai for helpful discussion.
Imagine you subscribe to the universal prior. Roughly, this means you assign credence 2^-k to each program of length k whose output matches your sensory inputs so far, and 0 to all programs that failed to match. Does this imply you should assign credence 2^-m to any statement about the universe ("hypothesis") that has length m? or maybe Kolmogorov complexity m?
The answer is no. Consider the following examples:
1. The complexity of "A and B and C and D" is roughly equal to the complexity of "A or B or C or D", but we know for certain that the former hypothesis can never be more probable than the latter, no matter what A, B, C and D are.
2. The hypothesis "the correct theory of everything is the lexicographically least algorithm with K-complexity 3^^^^3" is quite short, but the universal prior for it is astronomically low.
3. The hypothesis "if my brother's wife's first son's best friend flips a coin, it will fall heads" has quite high complexity, but should be assigned credence 0.5, just like its negation.
Instead, the right way to derive a prior over hypotheses from a prior over predictors should be to construct the set of all predictors (world-algorithms) that "match" the hypothesis, and see how "wide" or "narrow" that set is. There's no connection to the complexity of the hypothesis itself.
An exception is if the hypothesis gives an explicit way to construct a predictor that satisfies it. In that case the correct prior for the hypothesis is bounded from below by the "naive" prior implied by length, so it can't be too low. This isn't true for many interesting hypotheses, though. For example the words "Islam is true", even expanded into the complete meanings of these words as encoded in human minds, don't offer you a way to implement or predict an omnipotent Allah, so the correct prior value for the Islam hypothesis is not obvious.
This idea may or may not defuse Pascal's Mugging - I'm not sure yet. Sorry, I was wrong about that, see Spurlock's comment and my reply.
Your post still wouldn't make any sense. Events/world have prior probability, but not complexity. Descriptions/algorithms/predictors have complexity, but not prior probability.
Infinite number of descriptions of different lengths predicts same event.
This confuses "A and B and C and D" as description (which has complexity but no prior probability) and "A and B and C and D" as subset of possible worlds (which has prior probability but no complexity).
Speaking about Kolmogorov complexity requires a lot of precision - we can select toy world but there's just no way to use anything less than Turing complete description language.
Except if world can described in m bits, 2^-m serves as a lower bound on probability, as long as it's consistent with data.
Is it? If you try to encode informal statements formally, they get very big. And doesn't "lexicographically least algorithm with K-complexity 3^^^^3" violate Rice theorem? If it wasn't for these problems, 2^-m would still be proper lower bound of prior probability of this.
Any statement about the world ("description") does have a prior probability, which is equal to the sum of prior probabilities of worlds satisfying that statement. This is kind of obvious.
The prior probability of a statement is not connected to its "complexity" (whatever that means). This is also kind of obvious, but many people have managed to miss it, so I wrote the post.
A formal encoding of example 2 (as a predicate on algorithms expressed in ZFC, say) isn't very long or difficult. The notion of K-complexity is easily definable, thou... (read more)