Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: IlyaShpitser 28 July 2014 07:53:10PM *  4 points [-]

I don't follow you. Overfitting happens when your model has too many parameters, relative to the amount of data you have. It is true that linear models may have few parameters compared to some non-linear models (for example linear regression models vs regression models with extra interaction parameters). But surely, we can have sparsely parameterized non-linear models as well.


All I am saying is that if things are surprising it is either due to "noise" (variance) or "getting the truth wrong" (bias). Or both.


I agree that "models we can quickly and easily use while under publish-or-perish pressure" is an important class of models in practice :). Moreover, linear models are often in this class, while a ton of very interesting non-linear models in stats are not, and thus are rarely used. It is a pity.

Comment author: henry4k2PH4 04 August 2014 12:40:26AM 2 points [-]

A technical difficulty with saying that overfitting happens when there are "too many parameters" is that the parameters may do arbitrarily complicated things. For example they may encode C functions, in which case a model with a single (infinite-precision) real parameter can fit anything very well! Functions that are linear in their parameters and inputs do not suffer from this problem; the number of parameters summarizes their overfitting capacity well. The same is not true of some nonlinear functions.

To avoid confusion it may be helpful to define overfitting more precisely. The gist of any reasonable definition of overfitting is: If I randomly perturb the desired outputs of my function, how well can I find new parameters to fit the new outputs? I can't do a good job of giving more detail than that in a short comment, but if you feel confused about overfitting, here's a good (and famous) article about frequentist learning theory by Vladimir Vapnik that may be useful:

http://web.mit.edu/6.962/www/www_spring_2001/emin/slt.pdf