"But what is a "reasonable" distribution? Why not label "reasonable" a very complicated prior distribution, which makes Occam's Razor work in all observed tests so far, but generates exceptions in future cases?"
Occam's Razor is only relevant to model selection problems. A complicated prior distribution does not matter. What does matter is how much the prior distribution volume in parameter space decreases as the model becomes more complex (more parameters). Each additional parameter in the model spreads the prior distribution over an increased parameter space.
For a more complex model to have a higher posterior distribution, the evidence (likelihood) must increase the posterior volume more than the addition prior parameter(s) decrease it. Since it is possible to fit the model to noise (uncertainly) in the data, the likelihood for the model with more parameters will be greater or equal to the likelihood for a model with less parameters. When an increase in likelihood is due to fitting data instead of noise, the more complex model becomes more probable. Otherwise the decrease in the prior distribution volume reduces the probability for the model with more parameters.
A reasonable distribution is one that assigns a reasonable prior to all the parameters in the model. After that Bayes Theorem takes care of the rest.
"But what is a "reasonable" distribution? Why not label "reasonable" a very complicated prior distribution, which makes Occam's Razor work in all observed tests so far, but generates exceptions in future cases?"
Occam's Razor is only relevant to model selection problems. A complicated prior distribution does not matter. What does matter is how much the prior distribution volume in parameter space decreases as the model becomes more complex (more parameters). Each additional parameter in the model spreads the prior distribution over an increased parameter space.
For a more complex model to have a higher posterior distribution, the evidence (likelihood) must increase the posterior volume more than the addition prior parameter(s) decrease it. Since it is possible to fit the model to noise (uncertainly) in the data, the likelihood for the model with more parameters will be greater or equal to the likelihood for a model with less parameters. When an increase in likelihood is due to fitting data instead of noise, the more complex model becomes more probable. Otherwise the decrease in the prior distribution volume reduces the probability for the model with more parameters.
A reasonable distribution is one that assigns a reasonable prior to all the parameters in the model. After that Bayes Theorem takes care of the rest.