A common feature of many proposed logical priors is a preference for simple sentences over complex ones. This is sort of like an extension of Occam's razor into math. Simple things are more likely to be true. So, as it is said, "why not?"
Well, the analogy has some wrinkles - unlike hypothetical rules for the world, logical sentences do not form a mutually exclusive set. Instead, for every sentence A there is a sentence not-A with pretty much the same complexity, and probability 1-P(A). So you can't make the probability smaller for all complex sentences, because their negations are also complex sentences! If you don't have any information that discriminates between them, A and not-A will both get probability 1/2 no matter how complex they get.
But if our agent knows something that breaks the symmetry between A and not-A, like that A belongs to a mutually exclusive and exhaustive set of sentences with differing complexities, then it can assign higher probabilities to simpler sentences in this set without breaking the rules of probability. Except, perhaps, the rule about not making up information.
The question: is the simpler answer really more likely to be true than the more complicated answer, or is this just a delusion? If so, is it for some ontologically basic reason, or for a contingent and explainable reason?
There are two complications to draw your attention to. The first is in what we mean by complexity. Although it would be nice to use the Kolmogorov complexity of any sentence, which is the length of the shortest program that prints the sentence, such a thing is uncomputable by the kind of agent we want to build in the real world. The only thing our real-world agent is assured of seeing is the length of the sentence as-is. We can also find something in between Kolmogorov complexity and length by doing a brief search for short programs that print the sentence - this meaning is what is usually meant in this article, and I'll call it "apparent complexity."
The second complication is in what exactly a simplicity prior is supposed to look like. In the case of Solomonoff induction the shape is exponential - more complicated hypotheses are exponentially less likely. But why not a power law? Why not even a Poisson distribution? Does the difficulty of answering this question mean that thinking that simpler sentences are more likely is a delusion after all?
Thought experiments:
1: Suppose our agent knew from a trusted source that some extremely complicated sum could only be equal to A, or to B, or to C, which are three expressions of differing complexity. What are the probabilities?
Commentary: This is the most sparse form of the question. Not very helpful regarding the "why," but handy to stake out the "what." Do the probabilities follow a nice exponential curve? A power law? Or, since there are just the three known options, do they get equal consideration?
This is all based off intuition, of course. What does intuition say when various knobs of this situation are tweaked - if the sum is of unknown complexity, or of complexity about that of C? If there are a hundred options, or countably many? Intuitively speaking, does it seem like favoring simpler sentences is an ontologically basic part of your logical prior?
2: Consider subsequences of the digits of pi. If I give you a pair (n,m), you can tell me the m digits following the nth digit of pi. So if I start a sentence like "the subsequence of digits of pi (10100, 102) = ", do you expect to see simpler strings of digits on the right side? Is this a testable prediction about the properties of pi?
Commentary: We know that there is always a short-ish program to produce the sequences, which is just to compute the relevant digits of pi. This sets a hard upper bound on the possible Kolmogorov complexity of sequences of pi (that grows logarithmically as you increase m and n), and past a certain m this will genuinely start restricting complicated sequences, and thus favoring "all zeros" - or does it?
After all, this is weak tea compared to an exponential simplicity prior, for which the all-zero sequence would be hojillions of times more likely than a messy one. On the other hand, an exponential curve allows sequences with higher Kolmogorov complexity than the computation of the digits of pi.
Does the low-level view outlined in the first paragraph above demonstrate that the exponential prior is bunk? Or can you derive one from the other with appropriate simplifications (keeping in mind Komogorov complexity vs. apparent complexity)? Does pi really contain more long simple strings than expected, and if not what's going on with our prior?
3: Suppose I am writing an expression that I want to equal some number you know - that is, the sentence "my expression = your number" should be true. If I tell you the complexity of my expression, what can you infer about the likelihood of the above sentence?
Commentary: If we had access to Kolmogorov complexity of your number, then we could completely rule out answers that were too K-simple to work. With only an approximation, it seems like we can still say that simple answers are less likely up to a point. Then as my expression gets more and more complicated, there are more and more available wrong answers (and, outside of the system a bit, it becomes less and less likely that I know what I'm doing), and so probability goes down.
In the limit that my expression is much more complex than your number, does an elegant exponential distribution emerge from underlying considerations?
With regards to thought experiment 2, I think a useful reference point is completely random strings of digits. If you were indeed picking the strings of digits totally at random, you would expect a distribution in which:
1) Any string of digits is equally likely.
2) A significant majority of the strings has complexity of roughly m (plus some constant overhead).
Now, in the actual thought experiment, the string of digits is quite obviously not random; as you've said, the Kolmogorov complexity has a hard upper bound of C + log(m) + log(n), which in general could be significantly smaller than m.
Here's one alternative hypothesis that I can think of, which makes sense to me but could easily be wrong:
H2: In general, such strings will appear to be mostly random, except for the simple fact that they actually occur relatively early in pi.
If this is true, one would expect that for quite a lot of those strings there will not be a more compact representation than a program that, in one way or another, explicitly computes those digits of pi.
Also, I think I've thought of an interesting way of testing a prediction along the lines you've suggested. If we take S(n, m, pi) to mean "the m digits following the nth digit of pi", let's define
N(s, pi) := the smallest integer n such that S(n, len(s), pi) = s
If, as you suggest, simpler strings should be "more likely", then it would follow that for simpler strings s, you would expect to see lower values of N(s, pi). In particular, how about the following experiments:
i) Generate a bunch of random sequences R (all of length m) and look at the distribution you get for N(r, pi). You can also use your apparent complexity checker on those sequences, but in general you would expect most of them to have complexity of roughly m.
ii) Pick out a bunch of strings Q that are demonstrably simple, e.g. 0000000000, 1111111111, 0123456789, and look at the values you get for N(q, pi)
iii) Pick a different number, which you think has similar properties to pi in this regard - let's say e. Now, for a given number n, you can calculate
F(n) := N(S(n, m, e), pi)
which basically means "for a sequence occurring at position n in e, what position does it first occur at in pi"?
I think if your hypothesis is true, it will hold that the values for (ii) are typically much smaller than the values in (i), and the values for F(n) in (iii) will generally tend to increase as n increases (up to a finite limit, because eventually you will have covered all sequences of length m). On the other hand, if H2 is correct, the values for (i), (ii), and (iii) will all be pretty similar.
Now, with all that being said, I'm not sure that your thought experiment #2 carries across to a prior as to whether or not certain logical statements are true. It may be an interesting experiment to actually try in practice, though.
A useful point of contrast is the idea of a normal number, as pi and e are widely believed to be. The crucial difference is that normal-ness requires the asymptotic frequencies to be the same for all strings of the same length, which is very different to the criterion I mentioned above.