Why should the Occamian prior work so well in the real world? It's a seemingly profound mystery that is asking to be dissolved.
To begin with, I propose a Lazy Razor and a corresponding Lazy prior:
Given several competing models of reality, we should select the one that is easiest to work with.
This is merely a formulation of the obvious trade-off between accuracy and cost. I would rather have a bad prediction today than a good prediction tomorrow or a great prediction ten years from now. Ultimately, this prior will deliver a good model, because it will let you try out many different models fast.
The concept of "easiness" may seem even more vague than "complexity", but I believe that in any specific context its measurement should be clear. Note, "easiness" is measured in man-hours, dollars or etc, it's not to be confused with "hardness" in the sense of P and PN. If you still don't know how to measure "easiness" in your context, you should use the Lazy prior to choose an "easiness" measurement procedure. To break the recursive loop, know that the Laziest of all models is called "pulling numbers out of your ass".
Now let's return to the first question. Why should the Occamian prior work so well in the real world?
The answer is, it doesn't, not really. Of all the possible priors, the Occamian prior holds no special place. Its greatest merit is that it often resembles Lazy prior in the probabilities it offers. Indeed it is easy to see, that a random model with a billion parameters is disliked by both priors, and that a model with two parameters is loved by both. By the way, its second greatest merit is being easy to work with.
Note, the priors are not interchangeable. One case where they disagree is on making use of existing resources. Suppose mathematics has derived powerful tools for working with A-theory but not B-theory. Then Lazy prior would suggest that a complex model based on A-theory may be preferable to a simpler one based on B-theory. Or, suppose some process took millions of years to produce abundant and powerful meat-based computers. Then Lazy prior would suggest that we make use of them in our models, regardless of their complexity, while the Occamian prior would object.
I have a feeling that you mix probability and decision theory. Given some observations, there are two separate questions when considering possible explanations / models:
1. What probability to assign to each model?
2. Which model to use?
Now, our toy-model of perfect rationality would use some prior, e.g. the bit-counting universal/kolmogorov/occam one, and bayesian update to answer (1), i.e. compute the posterior distribution. Then, it would weight these models by "convenience of working with them", which goes into our expected utility maximization for answering (2), since we only have finite computational resources after all. In many cases we will be willing to work with known wrong-but-pretty-good models like Newtonian gravity, just because they are so much more convenient and good enough.
I have a feeling that you correctly intuit that convenience should enter the question which model to adopt, but misattribute this into the probability-- but which model to adopt should formally be bayesian update + utility maximization (taking convenience and bounded computational resources into account), and definitely not "Bayesian update only", which leads you to the (imho questionable) conclusion that the universal / kolmogorov / occam prior is flawed for computing probability.
On the other hand, you are right that the above toy model of perfect rationality is computationally bad: Computing the posterior distribution after some prior and then weighting by utility/convenience is of stupid if directly computing prior * convenience is cheaper than computing prior and convenience separately and then multiplying. More generally, probability is a nice concept for human minds to reason about reasoning, but we ultimately care about decision theory only.
Always combining probability and utility might be a more correct model, but it is often conceptually more complex to my mind, which is why I don't try to always adopt it ;)
>But the greatest merit of Occamian prior is that it vaguely resembles the Lazy prior.
...
>With that in mind, I asked what prior would serve this purpose even better and arrived at Lazy prior. The idea of encoding these considerations in a prior may seem like an error of some kind, but the choice of a prior is subjective by definition, so it should be fine.
Encoding convenience * probability into some kind of pseudo-prior such that the expected-utility maximizer is the maximum likelihood model with respect to the pseudo-prior does seem like a really us... (read more)