Why should the Occamian prior work so well in the real world? It's a seemingly profound mystery that is asking to be dissolved.
To begin with, I propose a Lazy Razor and a corresponding Lazy prior:
Given several competing models of reality, we should select the one that is easiest to work with.
This is merely a formulation of the obvious trade-off between accuracy and cost. I would rather have a bad prediction today than a good prediction tomorrow or a great prediction ten years from now. Ultimately, this prior will deliver a good model, because it will let you try out many different models fast.
The concept of "easiness" may seem even more vague than "complexity", but I believe that in any specific context its measurement should be clear. Note, "easiness" is measured in man-hours, dollars or etc, it's not to be confused with "hardness" in the sense of P and PN. If you still don't know how to measure "easiness" in your context, you should use the Lazy prior to choose an "easiness" measurement procedure. To break the recursive loop, know that the Laziest of all models is called "pulling numbers out of your ass".
Now let's return to the first question. Why should the Occamian prior work so well in the real world?
The answer is, it doesn't, not really. Of all the possible priors, the Occamian prior holds no special place. Its greatest merit is that it often resembles Lazy prior in the probabilities it offers. Indeed it is easy to see, that a random model with a billion parameters is disliked by both priors, and that a model with two parameters is loved by both. By the way, its second greatest merit is being easy to work with.
Note, the priors are not interchangeable. One case where they disagree is on making use of existing resources. Suppose mathematics has derived powerful tools for working with A-theory but not B-theory. Then Lazy prior would suggest that a complex model based on A-theory may be preferable to a simpler one based on B-theory. Or, suppose some process took millions of years to produce abundant and powerful meat-based computers. Then Lazy prior would suggest that we make use of them in our models, regardless of their complexity, while the Occamian prior would object.
Yes, the two priors aren't as close as I might have implied. But still there are many cases where they agree. For example, given a random 6-state TM and a random 7-state TM, both Lazy and Occamian priors will usually prefer the 6-state machine.
By the way, if I had to simulate these TMs by hand, I could care a lot about computation time, but now that we have cheap computers, computation time has a smaller coefficient, and the time for building the TM is more important. This is how it works, "easiness" is measured in man-hours, it's not just the number of steps the TM makes.