Is simplicity truth indicative?

27chaos

This essay claims to refute a popularized understanding of Occam's Razor that I myself adhere to. It is confusing me, since I hold this belief at a very deep level that it's difficult for me to examine. Does anyone see any problems in its argument, or does it seem compelling? I specifically feel as though it might be summarizing the relevant Machine Learning research badly, but I'm not very familiar with the field. It also might be failing to give any credit to simplicity as a general heuristic when simplicity succeeds in a specific field, and it's unclear whether such credit would be justified. Finally, my intuition is that situations in nature where there is a steady bias towards growing complexity are more common than the author claims, and that such tendencies are stronger for longer. However, for all of this, I have no clear evidence to back up the ideas in my head, just vague notions that are difficult to examine. I'd appreciate someone else's perspective on this, as mine seems to be distorted.

Essay: http://bruce.edmonds.name/sinti/

It turns out there's an extremely straightforward mathematical reason why simplicity is to some extent an indicator of high probability.

Consider the list of all possible hypotheses with finite length. We might imagine there being a labeling of this list, starting with hypothesis 1, then hypothesis 2, and continuing on for an infinite number of hypotheses. This list contains the hypotheses capable of being distinguished by a human brain, input into a computer, having their predictions checked against the others, and other nice properties like that. In order to make predictions about which hypothesis is true, all we have to do is assign a probability to each one.

The obvious answer is just to give every hypotheses equal probability. But since there's an infinite number of these hypotheses, that can't work, because we'd end up giving every hypothesis probability zero! So (and here's where it starts getting Occamian) it turns out that any valid probability assignment has to get smaller and smaller as we go to very high numbers in the list (so that the probabilities can all add up to 1). At low numbers in the list the probability is, in general, allowed to go up and down, but hypotheses with very high numbers always have to be low probability.

There's a caveat, though - the position in the list can be arbitrary, and doesn't have to be based on simplicity. But it turns out that it is impossible to make any ordering of hypotheses at all, without having more complicated hypotheses have higher numbers than simpler hypotheses on average.

There's a general argument for this (there's a more specific argument based on universal turing machines that you can find in a good textbook) that's basically a reflection of the fact that there's a most simple hypothesis, but no "most complex" hypothesis, just like how there's no biggest positive integer. Even if you tried to shuffle up the hypotheses really well, you have to have each simple hypothesis end up at some finite place in the list (otherwise they end up at no place in the list and it's not a valid shuffling). And if the simple hypotheses are all at finite places in the list, that means there's still an infinite number of complex hypotheses with higher numbers, so complexity still decreases for large enough places in the list.

Thanks for this! Apparently, among many economists Occam's Razor is viewed as just a modelling trick, judging from the conversations on Reddit I've had recently. I'd felt that perspective was incorrect for a while, but after encountering it so many times, and then later on being directed to this paper, I'd begun to fear my epistemology was built on shaky foundations. It's relieving to see that's not the case.

It turns out there's an extremely straightforward mathematical reason why simplicity is to some extent an indicator of high probability.

Is there a... (read more)

0[anonymous]11y

That only works if you have a countable set of mutually exclusive hypotheses, and exactly one of them is true. Not all worlds are like that. For example, if the "world" is a single real number picked uniformly from [0,1], then it's hard to say what the hypotheses should be. If hypotheses aren't restricted to being mutually exclusive, the approach doesn't work. For example, if you randomly generate sentences about the integers in some formal theory, then short sentences aren't more likely to be true than long ones. That leads to a problem if you want to apply Occam's razor to choosing physical theories, which aren't mutually exclusive. Another reason to prefer the simplest theories that fit observations well is that they make life easier for engineers. Kevin Kelly's Occam efficiency theorem is related, but the idea is really simpler than that.

2Slider11y

Why would the mapping between the language the hypotheses are framed in have impact on which statements are most likley to be true? The article mentions that in domains where the correct hypotheses are complex in the proof language the principle tends to be anti-productive. There is no guarantee that the language is well suited to describe the target phenomenon if we are allowed to freely pick the phenomenon to track! Wouldn't also any finite complexity class only have finitely many hypotheses in it and wouldn't those also be in a finite numbered index in it? The problem only arises for infinite complexity hypotheses. And it could be argued that if the index is a hyperinteger it can still be a valid placing. With surreal probability it would be no problem to give an equal infinistemal probability to an infinite list of hypotheses.