A common feature of many proposed logical priors is a preference for simple sentences over complex ones. This is sort of like an extension of Occam's razor into math. Simple things are more likely to be true. So, as it is said, "why not?"
Well, the analogy has some wrinkles - unlike hypothetical rules for the world, logical sentences do not form a mutually exclusive set. Instead, for every sentence A there is a sentence not-A with pretty much the same complexity, and probability 1-P(A). So you can't make the probability smaller for all complex sentences, because their negations are also complex sentences! If you don't have any information that discriminates between them, A and not-A will both get probability 1/2 no matter how complex they get.
But if our agent knows something that breaks the symmetry between A and not-A, like that A belongs to a mutually exclusive and exhaustive set of sentences with differing complexities, then it can assign higher probabilities to simpler sentences in this set without breaking the rules of probability. Except, perhaps, the rule about not making up information.
The question: is the simpler answer really more likely to be true than the more complicated answer, or is this just a delusion? If so, is it for some ontologically basic reason, or for a contingent and explainable reason?
There are two complications to draw your attention to. The first is in what we mean by complexity. Although it would be nice to use the Kolmogorov complexity of any sentence, which is the length of the shortest program that prints the sentence, such a thing is uncomputable by the kind of agent we want to build in the real world. The only thing our real-world agent is assured of seeing is the length of the sentence as-is. We can also find something in between Kolmogorov complexity and length by doing a brief search for short programs that print the sentence - this meaning is what is usually meant in this article, and I'll call it "apparent complexity."
The second complication is in what exactly a simplicity prior is supposed to look like. In the case of Solomonoff induction the shape is exponential - more complicated hypotheses are exponentially less likely. But why not a power law? Why not even a Poisson distribution? Does the difficulty of answering this question mean that thinking that simpler sentences are more likely is a delusion after all?
Thought experiments:
1: Suppose our agent knew from a trusted source that some extremely complicated sum could only be equal to A, or to B, or to C, which are three expressions of differing complexity. What are the probabilities?
Commentary: This is the most sparse form of the question. Not very helpful regarding the "why," but handy to stake out the "what." Do the probabilities follow a nice exponential curve? A power law? Or, since there are just the three known options, do they get equal consideration?
This is all based off intuition, of course. What does intuition say when various knobs of this situation are tweaked - if the sum is of unknown complexity, or of complexity about that of C? If there are a hundred options, or countably many? Intuitively speaking, does it seem like favoring simpler sentences is an ontologically basic part of your logical prior?
2: Consider subsequences of the digits of pi. If I give you a pair (n,m), you can tell me the m digits following the nth digit of pi. So if I start a sentence like "the subsequence of digits of pi (10100, 102) = ", do you expect to see simpler strings of digits on the right side? Is this a testable prediction about the properties of pi?
Commentary: We know that there is always a short-ish program to produce the sequences, which is just to compute the relevant digits of pi. This sets a hard upper bound on the possible Kolmogorov complexity of sequences of pi (that grows logarithmically as you increase m and n), and past a certain m this will genuinely start restricting complicated sequences, and thus favoring "all zeros" - or does it?
After all, this is weak tea compared to an exponential simplicity prior, for which the all-zero sequence would be hojillions of times more likely than a messy one. On the other hand, an exponential curve allows sequences with higher Kolmogorov complexity than the computation of the digits of pi.
Does the low-level view outlined in the first paragraph above demonstrate that the exponential prior is bunk? Or can you derive one from the other with appropriate simplifications (keeping in mind Komogorov complexity vs. apparent complexity)? Does pi really contain more long simple strings than expected, and if not what's going on with our prior?
3: Suppose I am writing an expression that I want to equal some number you know - that is, the sentence "my expression = your number" should be true. If I tell you the complexity of my expression, what can you infer about the likelihood of the above sentence?
Commentary: If we had access to Kolmogorov complexity of your number, then we could completely rule out answers that were too K-simple to work. With only an approximation, it seems like we can still say that simple answers are less likely up to a point. Then as my expression gets more and more complicated, there are more and more available wrong answers (and, outside of the system a bit, it becomes less and less likely that I know what I'm doing), and so probability goes down.
In the limit that my expression is much more complex than your number, does an elegant exponential distribution emerge from underlying considerations?
Can I ask you some more questions?
In thought experiment 2, the subsequences of pi, it seems like simpler sentences really are favored (because sequences more K-complex than the description are ruled out). Do you agree? Do you think this constitutes an empirical prediction that the all-zeroes subsequence will be overrepresented in pi?
How do you think a logical prior should behave in the limit that it's applying to a countable set of sentences? After all, you can't have a uniform distribution on the integers. Should it stay uniform for every finite number of sentences but be nonuniform if we go up a meta-level and consider an infinite set rather than individual sentences?
Well, computable normal numbers exist. if you replace pi with such a number, then we know that strings of zeroes aren't overrepresented in the sense of asymptotic frequency. They might be overrepresented in some other sense, though. Can you define that precisely?
I don't see why the prior should be uniform. None of the proposed priors are. Maybe I misunderstand your question?