Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

henry4k2PH4 comments on Why the tails come apart - Less Wrong

115 Post author: Thrasymachus 01 August 2014 10:41PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (90)

You are viewing a single comment's thread. Show more comments above.

Comment author: henry4k2PH4 04 August 2014 12:40:26AM 2 points [-]

A technical difficulty with saying that overfitting happens when there are "too many parameters" is that the parameters may do arbitrarily complicated things. For example they may encode C functions, in which case a model with a single (infinite-precision) real parameter can fit anything very well! Functions that are linear in their parameters and inputs do not suffer from this problem; the number of parameters summarizes their overfitting capacity well. The same is not true of some nonlinear functions.

To avoid confusion it may be helpful to define overfitting more precisely. The gist of any reasonable definition of overfitting is: If I randomly perturb the desired outputs of my function, how well can I find new parameters to fit the new outputs? I can't do a good job of giving more detail than that in a short comment, but if you feel confused about overfitting, here's a good (and famous) article about frequentist learning theory by Vladimir Vapnik that may be useful:

http://web.mit.edu/6.962/www/www_spring_2001/emin/slt.pdf

Comment author: IlyaShpitser 04 August 2014 04:48:38PM *  2 points [-]

This is about "reasonable encoding" not "linearity," though. That is, linear functions of parameters encode reasonably, but not all reasonable encodings are linear. We can define a parameter to be precisely one bit of information, and then ask for the minimum of bits needed.

I don't understand why people are so hung up on linearity.