Does the fact that naive neural nets almost always fail when applied to out of sample data constitute a strong general argument against the anti-universalizing approach?
I think this demonstrates the problem rather well. In the end, the phenomenon you are trying to model has a level of complexity N. You want your model (neural network or theory or whatever) to have the same level of complexity - no more, no less. So the fact that naive neural nets fail on out of sample data for a given problem shows that the neural network did not reach sufficient complexity. That most naive neural networks fail shows that most problems have at least a bit more complexity than that embodied in the simplest neural networks.
As for how to approach the problem in view of all this... Consider this: for any particular problem of complexity N, there are N - 1 levels of complexity below it, which may fail to make accurate predictions due to oversimplification. And then there's an infinity of complexity levels above N, which may fail to make accurate predictions due to overfitting. So it makes sense to start with simple theories, and keep adding complexity as new observations arrive, and gradually improve the predictions we make, until we have the simplest theory we can which still produces low errors when predicting new observations.
I say low errors because to truly match all observations would certainly be overfitting! So there at the end we have the same problem again, where we trade off accuracy on current data against overfitting errors on future data... Simple (higher errors) versus complex (higher overfitting)... At the end of the process, only empiricism can help us find the theory that produces the lowest error on future data!
So the fact that naive neural nets fail on out of sample data for a given problem shows that the neural network did not reach sufficient complexity.
This is one possibility. Another, MUCH more common in practice, is that your NN overfitted the in-sample data and so trivially failed at out-of-sample forecasting.
To figure out the complexity of the process you're trying to model, you first need to be able to separate features of that process from noise and this is far from a trivial exercise.
This essay claims to refute a popularized understanding of Occam's Razor that I myself adhere to. It is confusing me, since I hold this belief at a very deep level that it's difficult for me to examine. Does anyone see any problems in its argument, or does it seem compelling? I specifically feel as though it might be summarizing the relevant Machine Learning research badly, but I'm not very familiar with the field. It also might be failing to give any credit to simplicity as a general heuristic when simplicity succeeds in a specific field, and it's unclear whether such credit would be justified. Finally, my intuition is that situations in nature where there is a steady bias towards growing complexity are more common than the author claims, and that such tendencies are stronger for longer. However, for all of this, I have no clear evidence to back up the ideas in my head, just vague notions that are difficult to examine. I'd appreciate someone else's perspective on this, as mine seems to be distorted.
Essay: http://bruce.edmonds.name/sinti/