You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Lumifer comments on Open thread, Nov. 16 - Nov. 22, 2015 - Less Wrong Discussion

7 Post author: MrMind 16 November 2015 08:03AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (185)

You are viewing a single comment's thread. Show more comments above.

Comment author: Lumifer 17 November 2015 06:10:32PM 1 point [-]

There is an interesting angle to this -- I think it maps to the difference between (traditional) statistics and data science.

In traditional stats you are used to small, parsimonious models. In these small models each coefficient, each part of the model is separable in a way, it is meaningful and interpretable by itself. The big thing to avoid is overfitting.

In data science (and/or ML) a lot of models are of the sprawling black-box kind where coefficients are not separable and make no sense outside of the context of the whole model. These models aren't traditionally parsimonious either. Also, because many usual metrics scale badly to large datasets, overfitting has to be managed differently.

Comment author: bogus 17 November 2015 06:46:32PM *  -1 points [-]

In traditional stats you are used to small, parsimonious models. In these small models each coefficient, each part of the model is separable in a way, it is meaningful and interpretable by itself. The big thing to avoid is overfitting.In traditional stats you are used to small, parsimonious models. In these small models each coefficient, each part of the model is separable in a way, it is meaningful and interpretable by itself. The big thing to avoid is overfitting.

Keep in mind that traditional stats also includes semi-parametric and non-parametric methods. These give you models which basically manage overfitting by making complexity scale with the amount of data, i.e. they're by no means "small" or "parsimonious" in the general case. And yes, they're more similar to the ML stuff but you still get a lot more guarantees.

Also, because many usual metrics scale badly to large datasets, overfitting has to be managed differently.

I get the impression that ML folks have to be way more careful about overfitting because their methods are not going to find the 'best' fit - they're heavily non-deterministic. This means that an overfitted model has basically no real chance of successfully extrapolating from the training set. This is a problem that traditional stats doesn't have - in that case, your model will still be optimal in some appropriate sense, no matter how low your measures of fit are.

Comment author: IlyaShpitser 17 November 2015 09:53:54PM *  1 point [-]

I think I am giving up on correcting "google/wikipedia experts," it's just a waste of time, and a losing battle anyways. (I mean the GP here).


I get the impression that ML folks have to be way more careful about overfitting because their methods are not going to find the 'best' fit - they're heavily non-deterministic. This means that an overfitted model has basically no real chance of successfully extrapolating from the training set. This is a problem that traditional stats doesn't have - in that case, your model will still be optimal in some appropriate sense, no matter how low your measures of fit are.

That said, this does not make sense to me. Bias variance tradeoffs are fundamental everywhere.