Many thanks to Unknowns for inventing the scenario that led to this post, and to Wei Dai for helpful discussion.
Imagine you subscribe to the universal prior. Roughly, this means you assign credence 2^-k to each program of length k whose output matches your sensory inputs so far, and 0 to all programs that failed to match. Does this imply you should assign credence 2^-m to any statement about the universe ("hypothesis") that has length m? or maybe Kolmogorov complexity m?
The answer is no. Consider the following examples:
1. The complexity of "A and B and C and D" is roughly equal to the complexity of "A or B or C or D", but we know for certain that the former hypothesis can never be more probable than the latter, no matter what A, B, C and D are.
2. The hypothesis "the correct theory of everything is the lexicographically least algorithm with K-complexity 3^^^^3" is quite short, but the universal prior for it is astronomically low.
3. The hypothesis "if my brother's wife's first son's best friend flips a coin, it will fall heads" has quite high complexity, but should be assigned credence 0.5, just like its negation.
Instead, the right way to derive a prior over hypotheses from a prior over predictors should be to construct the set of all predictors (world-algorithms) that "match" the hypothesis, and see how "wide" or "narrow" that set is. There's no connection to the complexity of the hypothesis itself.
An exception is if the hypothesis gives an explicit way to construct a predictor that satisfies it. In that case the correct prior for the hypothesis is bounded from below by the "naive" prior implied by length, so it can't be too low. This isn't true for many interesting hypotheses, though. For example the words "Islam is true", even expanded into the complete meanings of these words as encoded in human minds, don't offer you a way to implement or predict an omnipotent Allah, so the correct prior value for the Islam hypothesis is not obvious.
This idea may or may not defuse Pascal's Mugging - I'm not sure yet. Sorry, I was wrong about that, see Spurlock's comment and my reply.
I think your general conclusion is correct: "does subscription to the universal prior imply assigning probability 2^-m to any hypothesis of length/complexity m? .. ahh no". But perhaps obviously so?
The key is the universal prior deals with programs that are complete predictors of an entire sequence of observations. They are not just simple statements.
If you want to compare to simple statements, you need to quantify their predictive power. The universal prior is a secondary classification, a way of ranking a set of algorithms/hypotheses that all are 100% perfectly accurate and specific. Its like a secondary weighting procedure you use when you have many equally perfect choices given your current data. ( given my understanding refreshed by your description)
That clearly can't apply to statements in general, because statements in general do not in general perfectly predict the whole sequence.
For #1, what are A,B,C,D supposed to be? Complete world predictor programs?
Simple statements? If A&B&C&D is a complete predictor, and so is A|B|C|D, then the universal prior will not choose either: it will choose the simplest of A,B,C,D.
Otherwise, if A&B&C&D is required for a full predictor, then A|B|C|D can not be a full predictor. There is no case then where A&B&C&D has less predictive power than A|B|C|D. The former is more specific and thus has strictly more predictive power, but of course yes is less intrinsically probable.