So then it would follow that when searching for a theory, the simplest ones will not always be the correct ones, since the observation-generating phenomenon was not chosen by nature to necessarily be the simplest phenomenon that could generate those observations. I think that may be what the essay is really getting at.
It might be a difference of starting points, then. We can either start with a universal approach, a broad prior, and use general heuristics like Occam's Razor, then move towards the specifics of a situation, or we can start with a narrow prior and a view informed by local context, to see how Nature typically operates in such domains according to the evidence of our intuitions, then try to zoom out. Of course both approaches have advantages in some cases, so what's actually being debated is their relative frequency.
I'm not sure of any good way to survey the problem space in an unbiased way to assess whether or not this assertion is typically true (maybe Monte Carlo simulations over random algorithms or something ridiculous like that?), but the point that adding unnecessary additional assumptions to a theory is flawed practice seems like a good heuristic argument suggesting we should generally assume simplicity. Does the fact that naive neural nets almost always fail when applied to out of sample data constitute a strong general argument against the anti-universalizing approach? Or am I just mixing metaphors recklessly here, with this whole "localism" thing? Simplicity and generalizability are more or less the same thing, right? Or is that question assuming the conclusion once again?
Does the fact that naive neural nets almost always fail when applied to out of sample data constitute a strong general argument against the anti-universalizing approach?
I think this demonstrates the problem rather well. In the end, the phenomenon you are trying to model has a level of complexity N. You want your model (neural network or theory or whatever) to have the same level of complexity - no more, no less. So the fact that naive neural nets fail on out of sample data for a given problem shows that the neural network did not reach sufficient comple...
This essay claims to refute a popularized understanding of Occam's Razor that I myself adhere to. It is confusing me, since I hold this belief at a very deep level that it's difficult for me to examine. Does anyone see any problems in its argument, or does it seem compelling? I specifically feel as though it might be summarizing the relevant Machine Learning research badly, but I'm not very familiar with the field. It also might be failing to give any credit to simplicity as a general heuristic when simplicity succeeds in a specific field, and it's unclear whether such credit would be justified. Finally, my intuition is that situations in nature where there is a steady bias towards growing complexity are more common than the author claims, and that such tendencies are stronger for longer. However, for all of this, I have no clear evidence to back up the ideas in my head, just vague notions that are difficult to examine. I'd appreciate someone else's perspective on this, as mine seems to be distorted.
Essay: http://bruce.edmonds.name/sinti/