gwern comments on Why the tails come apart - LessWrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (90)
(It would be nice if you would link fulltext instead of providing citations; if you don't have access to the fulltext, it's a bad idea to cite it, and if you do, you should provide it for other people who are trying to evaluate your claims and whether the paper is relevant or wrong.)
I've put up the first paper at https://dl.dropboxusercontent.com/u/85192141/1971-goldberg.pdf / https://pdf.yt/d/Ux7RZXbo0n374dUU I don't think this is particularly relevant: it only shows that 2 very specific equations (pg4, #3 & #4) did not outperform the linear model on a particular dataset. Too bad for Einhorn 1971.
Your second paper doesn't support the claims:
These aren't very good methods for extracting the full measure of information.
So to summarize: reality isn't entirely linear, so nonlinear methods frequently excel with modern developments to regularize and avoid overfitting (we can see this in the low prevalence of linear methods in demanding AI tasks like image recognition, or more generally, competitions like Kaggle on all sorts of domains); to the extent that humans are good predictors and classifiers too of reality, their predictions/classifications will be better mimicked by nonlinear methods; research showing the contrary typically does not compare very good methods and much more recent research may do much better (for example, parole/recidivism predictions by parole boards may be bad and easily improved on by linear models, but does that mean algorithms can't do even better?), and to the extent linear methods succeed, it may reflect the lack of relevant data or inherent randomness of results for a particular cherrypicked task.
To show your original claim ("in many fields, linear models (even poor ones) are the best we're going to get, with more complex models losing to overfitting"), I would want to see linear models steadily beat all comers, from random forests to deep neural networks to ensembles of all of the above, on a wide variety of large datasets. I don't think you can show that.
I tend to agree with you about models, once overfitting is sorted.
This I've still seen no evidence for.