This essay claims to refute a popularized understanding of Occam's Razor that I myself adhere to. It is confusing me, since I hold this belief at a very deep level that it's difficult for me to examine. Does anyone see any problems in its argument, or does it seem compelling? I specifically feel as though it might be summarizing the relevant Machine Learning research badly, but I'm not very familiar with the field. It also might be failing to give any credit to simplicity as a general heuristic when simplicity succeeds in a specific field, and it's unclear whether such credit would be justified. Finally, my intuition is that situations in nature where there is a steady bias towards growing complexity are more common than the author claims, and that such tendencies are stronger for longer. However, for all of this, I have no clear evidence to back up the ideas in my head, just vague notions that are difficult to examine. I'd appreciate someone else's perspective on this, as mine seems to be distorted.
Essay: http://bruce.edmonds.name/sinti/
It turns out there's an extremely straightforward mathematical reason why simplicity is to some extent an indicator of high probability.
Consider the list of all possible hypotheses with finite length. We might imagine there being a labeling of this list, starting with hypothesis 1, then hypothesis 2, and continuing on for an infinite number of hypotheses. This list contains the hypotheses capable of being distinguished by a human brain, input into a computer, having their predictions checked against the others, and other nice properties like that. In order to make predictions about which hypothesis is true, all we have to do is assign a probability to each one.
The obvious answer is just to give every hypotheses equal probability. But since there's an infinite number of these hypotheses, that can't work, because we'd end up giving every hypothesis probability zero! So (and here's where it starts getting Occamian) it turns out that any valid probability assignment has to get smaller and smaller as we go to very high numbers in the list (so that the probabilities can all add up to 1). At low numbers in the list the probability is, in general, allowed to go up and down, but hypotheses with very high numbers always have to be low probability.
There's a caveat, though - the position in the list can be arbitrary, and doesn't have to be based on simplicity. But it turns out that it is impossible to make any ordering of hypotheses at all, without having more complicated hypotheses have higher numbers than simpler hypotheses on average.
There's a general argument for this (there's a more specific argument based on universal turing machines that you can find in a good textbook) that's basically a reflection of the fact that there's a most simple hypothesis, but no "most complex" hypothesis, just like how there's no biggest positive integer. Even if you tried to shuffle up the hypotheses really well, you have to have each simple hypothesis end up at some finite place in the list (otherwise they end up at no place in the list and it's not a valid shuffling). And if the simple hypotheses are all at finite places in the list, that means there's still an infinite number of complex hypotheses with higher numbers, so complexity still decreases for large enough places in the list.
Thanks for this! Apparently, among many economists Occam's Razor is viewed as just a modelling trick, judging from the conversations on Reddit I've had recently. I'd felt that perspective was incorrect for a while, but after encountering it so many times, and then later on being directed to this paper, I'd begun to fear my epistemology was built on shaky foundations. It's relieving to see that's not the case.
Is there a... (read more)