If confines oneself to elaborate, baroque rules, but sticks to a particular style of baroque elaboration, one obtains the same effect.
The point of Kolmogorov complexity is that there is some limit to how baroque your rules can ever become, or how baroque you can make a fundamentally simple rule by choosing a really weird prior.
In algorithmic information theory, the problem of choosing a prior is equivalent to choosing a particular universal Turing machine. If you pick a really weird Turing machine, you will end up assigning low prior probability to things that "normal" people would consider simple - like a sine wave, for example. But because of universal computation, there's a limit to how low a probability you can assign. Your weird machine is still universal, so somehow or other it's got to be able to produce a sine wave, after whatever translation-prefix machinations you have to perform to get it to act like a normal programming language.
Another way of viewing this is just to eschew the notion of generalization and state that the goal of learning is compression. If you do this, you end up doing all the same kinds of work, with many fewer philosophical headaches. Now the great deep problem of picking a prior boils down to the rather more quotidian one of picking a data format to use to transmit/encode data.
But because of universal computation, there's a limit to how low a probability you can assign.
I don't understand this claim. Surely we can make the probability of a specific program as low as we want by restricting the programming language in ad hoc ways, while letting it stay Turing complete.
I declare this Open Thread open for discussion of Less Wrong topics that have not appeared in recent posts.