From Cafe Hayek (original): Two meteorologists have announced that they will stop using certain forecast methods, even though they've used them for 20 years.

There's a correction at the end of the article, too!

New to LessWrong?

New Comment
6 comments, sorted by Click to highlight new comments since: Today at 4:26 AM

We are discontinuing our early December quantitative hurricane forecast for the next year … Our early December Atlantic basin seasonal hurricane forecasts of the last 20 years have not shown real-time forecast skill even though the hindcast studies on which they were based had considerable skill.

Emphasis mine. The importance of cross-validation!

This reminds me of an article I read recently (and cannot find for the life of me) about the calibration of this type of model. Essentially, the author was pointing out that the curves fitted to past data to "train" these models frequently have more degrees of freedom than there are data points in the training set. For those of you who aren't familiar with curve-fitting, this means there is literally an infinite number of curves that can be fit to the data, giving a small probability of your algorithm finding one that models the future with an acceptable degree of accuracy.

I'll try and find the article again so I can link it.

EDIT: The article can be found here. It focuses on economic modeling, but the basic techniques are the same as those used in many other fields (including meteorology and climate science).

'Overfitting', yes? I think I may have learned about that from Nate Silver

Overfitting is one of the types of error that can crop up with this, but the error type that article refers to is the kind you get when you run a linear regression on a data set containing one point; there are infinitely many optimally-fit solutions that model the data.

Er, I'm not sure what you mean the distinction to be here. Overfitting is the superclass of that, not the subclass, as overfitting still describes this problem even when you can't perfectly describe your data (but there are many ways to do it optimally).

My mistake, I thought you were referring to overfitting with the connotation of a deliberate choice, like the manager who thinks he should fit a 9th-degree polynomial to some essentially linear data because "the line gets closer".

The models used for economic or climate data are usually based on theory, giving them a sensible number of degrees of freedom that may or may not match up with how much calibration data; I would not class this as overfitting in the common use of the term, as all the degrees of freedom do have legitimate reason to be there.