Suppose we have a bunch of (forecasted value, actual value) pairs for a given quantity (with different measured actual values at different times). An example would be GDP growth rate measures in different years. For each year, we have a forecasted value and an actual value. So we have a bunch of (forecasted value, actual value) pairs, one for each year. How do we judge the usefulness of the forecasts at predicting the value. Here, we discuss a few related measures: accuracy, bias, and dependency (specifically, correlation).

Accuracy

The accuracy of a forecast refers to how far, on average, the forecast is from the actual value. Two typical ways of measuring the accuracy are:

  • Compute the mean absolute error: Take the arithmetic mean (average) of the absolute values of the errors for each forecast.
  • Compute the root mean square error: Take the square root of the arithmetic mean of the squares of the errors.

The size of the error, measured in either of these ways, is a rough estimate of how accurate the forecasts are in general (the larger the error, the less accurate the forecast). Note that an error of zero represents a perfectly accurate forecast.

Note that this is a global measure of accuracy. But it may be the case that forecasts are more accurate when the actual values are at a particular level, and less accurate when they are at a different level. There are mathematical models to test for this.

Bias

When we ask whether the forecast is biased, we're interested in knowing whether the size of the error in the positive direction systematically exceeds the size of the error in the negative direction. One method for estimating this is to compute the mean signed difference (i.e., take the arithmetic mean of errors for individual forecasts without taking the absolute value). If this comes out as zero, then the forecasting is unbiased. If it comes out as positive, the forecasts are biased in the positive direction, whereas if it comes out as negative, the forecasts are biased in the negative direction.

The above is a start, but it's not good enough. In particular, the error could come out nonzero simply because of random fluctuations rather than bias. We'd need to complicate the model somewhat in order to make probabilistic or quantitative assessments to get a sense of whether or how the forecasts are really biased.

Again, the above is a global measure of bias. But it may be the case that there are different biases for different values. There are mathematical models to test for this.

Are accuracy and bias related? Yes, in the obvious sense that the degree of inaccuracy gives an upper bound on the degree of bias. In particular, for instance, the mean absolute error gives an upper bound on the mean signed difference. So a perfectly accurate forecast is also unbiased. However, we can have fairly inaccurate forecasts that are unbiased. For instance, a forecast that always guesses the mean of the distribution of actual values will be inaccurate but have zero bias.

The above discusses additive bias. There may also be multiplicative bias. For instance, the forecasted value may be reliably half the actual value. In this case, doubling the forecasted value allows us to obtain the actual value. There could also be forms of bias that are not captured in either way.

Dependency and correlation

Ideally, what we want to know is not so much whether the forecasts themselves are accurate or biased, but whether we can use them to generate new forecasts that are good. So what we want to know is: once we correct for bias (of all sorts, not just additive or multiplicative), how accurate is the new forecast? Another way of framing this is: what exactly is the nature of dependency between the variable representing the forecasted value and the variable representing the actual value?

Testing for the nature of the dependency between variables is a hard problem, particularly if we don't have a prior hypothesis for the nature of the dependency. If we do have a hypothesis, and the relation is linear in unknown parameters, we can use the method of ordinary least squares regression (or another suitable regression) to find the best fit. And we can measure the goodness of that fit through various statistical indicators.

In the case of linear regression (i.e., trying to fit using a linear functional dependency between the variables), the square of the correlation between the variables is the R2 of the regression, and offers a decent measure of how close the variables are to being linearly related. A correlation of 1 implies a R2 of 1, and implies that the variables are perfectly correlated, or equivalently, that a linear function with positive slope is a perfect fit. A correlation of -1 also implies a R2 of 1, would mean that a linear function with negative slope is a perfect fit. A correlation of zero means that the variables are completely uncorrelated.

Note also that linear regression covers both additive and multiplicative bias (and combinations thereof) and is often good enough to capture the most basic dependencies.

If the value of R2 for the linear regression is zero, that means the variables are uncorrelated. Although independent implies uncorrelated, uncorrelated does not imply independent, because there may be other nonlinear dependencies that miraculously give zero correlation. In fact, uncorrelated does not imply independent even if the variables are both normally distributed. As a practical matter, a correlation of zero is often taken as strong evidence that neither variable tells us much about the other. This is because even if the relationship isn't linear, the existence of some relationship makes a nonzero correlation more plausible than an exact zero correlation. For instance, if the variables are positively related (higher forecasted values predict higher actual values) we expect a positive correlation and a positive R2. If the variables are negatively related (higher forecasted values predict lower actual values) we expect a negative correlation, but still a positive R2.

For the trigonometrically inclined: The Pearson correlation coefficient, simply called the correlation here, measures the cosine of the angle between a vector based on the forecasted values and a vector based on the actual values. The vector based on the forecasted values is obtained by starting with the vector of the forecasted values and subtracting from each coordinate the mean forecasted value. Similarly, the vector based on the actual values is obtained by starting with the vector of the actual values and subtracting from each coordinate the mean actual value. The R2 value is the square of the correlation, and measures the proportion of variance in one variable that is explained by the other (this is sometimes referred to as the coefficient of determination). 1 -R2 represents the square of the sine between the vectors, and represents how alienated the vectors are from each other. A correlation of 1 means the vectors are collinear and point in the same direction, a positive correlation less than 1 means they form an acute angle, a zero correlation means they are at right angles, a negative correlation greater than -1 means they form an obtuse angle, and a correlation of -1 means the vectors are collinear and point in opposite directions.

Usefulness versus rationality

The simplest situation is where the forecasts are completely accurate. That's perfect. We don't need to worry about doing better.

In the case that the forecasts are not accurate, and if we have had the luxury of crunching the numbers and figuring out the nature of dependency between the forecasted and actual values, we'd want a situation where the actual value can be reliably predicted from the forecasted value, i.e., the actual value is a (known) function of the forecasted value. A simple case of this is where the actual value and forecasted value have a correlation of 1. This means that the actual value is a known linear function of the forecasted value. (UPDATE: This process of using a known linear function to correct for systematic additive and multiplicative bias is known as Theil's correction). So the forecasted value itself is not good, but it allows us to come up with a good forecast.

What would it mean for a forecast to be unimprovable? Essentially, it means that the best value we can forecast based on the forecasted value is the forecasted value. Wait, what? What we mean is that the forecasters aren't leaving any money on the table: if they could improve the forecast simply by correcting for a known bias, they have already done so. Note that a forecast being unimprovable does not say anything directly about the R2 value. Rather, the unimprovability suggests that the best functional fit between the forecasted and the actual value would be the identity function (actual value = forecasted value). For the linear regression case, it suggests that the slope for the linear regression is 1 and the intercept is 0. Or at any rate, that they are close enough. Note that a forecast that's completely useless is unimprovable.

The following table captures the logic (note that the two rows just describe the extreme cases, rather than the logical space of all possibilities).

 The forecast cannot be improved uponThe forecast can be improved upon
The forecast, once improved upon, is perfect The forecasted value equals the actual value. The forecasted value predicts the actual value perfectly, but is not itself perfect. For instance, they could have a correlation of 1, in which case the prediction would be via a linear function.
The forecast, even after improvement, is useless at the margin (i.e., it does not give us information we didn't already have from knowledge of the existing distribution of actual vaues) The forecast just involves perfectly guessing the mean of the distribution of actual values (assuming that the distribution is known in advance; if it's not, then things become even more murky).
The actual value is independent of the forecast, and it does not involve simply guessing the mean.

Note that if forecasters are rational, then we should be in the column "The forecast cannot be improved upon" and therefore between the extreme case that the forecast is already perfect and that the forecast just involves guessing the mean of the distribution (assuming that the distribution is known in advance).

So there are two real and somewhat distinct questions about the value of forecasts:

  • (The question whose extreme answers give the rows): How useful are the forecasts, in the sense that, once we extract all the information upon them by correcting for bias and applying the appropriate functional form, how accurate are the new forecasts?
  • (The question whose answers give the columns): How rational are the forecasters, in the sense of how close are their forecasts to the most useful forecasts that can be extracted from those forecasts? (Note that even if the forecasts cannot be improved upon, that doesn't mean the forecasts are rational in the broader sense of making the best guess in terms of all available information, but it is in any case consistent with rationality in this broader sense).

Background reading

For more background, see the Wikipedia pages on forecast bias and bias of an estimator and the content linked therein.

New Comment
9 comments, sorted by Click to highlight new comments since:

This comment applies to all of your recent posts.

I notice that you wrote a bunch of loosely related posts on forecasting but got little comments and votes for them. And I want to comment on that. I'm not clear what the exact reason is but I guess it's multi-factored.

  • Some posts are lengthy. I think you may assume a bit more prior knowledge and shortening them (by removing explanations or assumptions)

  • Some posts do not have a clear thread or result or key question. Try to make a point.

  • I get the impression that they are not (made) interesting enough esp. considering the number of posts. Some posts seem to be just accompanying material. You could post these as comments of one main post.

  • There is not a single place to comment these. No one stands out binds them together.

  • The sequence lacks cohesion. Actually it is no sequence but looks like a bunch of loosely related posts. You should add "Followup" or "Related" links. You could have tried to make it into a sequence by building on prior posts and working toward one key result.

I think the word you're looking for is "brain dumps" :-)

I'm ambivalent about brain dumps on LW. I can see both pros and cons.

Thanks for your thoughts. My purpose with the posts was more to lay out some of my preliminary thoughts on the subject and get any feedback people have, i.e., to check my thinking.

I do understand that many of the posts aren't awesome and won't get upvotes, and that's something I can accept (the marginal cost of polishing the posts to the point where they look really exciting probably isn't worth the benefit, if the posts can be reworked at all). The small amounts of net upvotes do seem to add up. Some of my posts (such as http://lesswrong.com/r/discussion/lw/jj1/supply_demand_and_technological_progress_how/ and http://lesswrong.com/lw/k25/beware_technological_wonderland_or_why_text_will/ did end up getting a lot of upvotes eventually). Some of my posts, such as this one, are more of "utility" posts that I will refer to in later posts. I should not that I am not personally too concerned with karma; as long as I am not being heavily downvoted on net, I think that posting passes a cost-benefit analysis for me.

While I do have a broader framework of where I'm going with the posts, the framework itself is changing frequently in my head, and I don't think it is productive to lay it out too much right now while it's still in so much flux. But I will periodically review and add more links between the posts. Once I am done with all posts, I will do a more thorough cleanup.

Ideally, what we want to know is not so much whether the forecasts themselves are accurate or biased, but whether we can use them to generate new forecasts that are good.

That's a weird approach. Normally, your forecasts are generated by some model. That model is usually fitted to historical data and then used to produce forecasts. If the forecasts are bad, you need to fix the model, not overlay another model on top of it.

Normally, your forecasts are generated by some model. That model is usually fitted to historical data and then used to produce forecasts. If the forecasts are bad, you need to fix the model, not overlay another model on top of it.

The forecasts may be coming from some external agency, not from me (for instance, they may be consensus forecasts generated by a poll). My goal is to use that number and come up with a better forecast from it. I could frame it as changing the model, but since I don't control the publication of the original forecast, I'm conceptualizing it as coming up with a new forecast.

Many forecasts have been observed to have systematic biases of some sorts so that explicitly correcting for these biases gives more accurate forecasts. After I published the post, I came across the name for using linear regression for improving forecasts. It's called Theil's correction. If you're controlling the forecast yourself, you can apply that before you publish the forecast. If the publication of the forecast is outside of your control, however, you need to apply that forecast afterwards.

For an example of a paper that uses this sort of approach, see http://forecasters.org/ijf/journal-issue/489/article/7093

The forecasts may be coming from some external agency, not from me... My goal is to use that number and come up with a better forecast from it.

In this case calling your inputs "forecasts" is just confusing. For you they are nothing but data on the basis of which you will build your own model to produce your forecasts.

In this framework you're just doing normal forecasting and should use all the normal tools. There's no reason to limit yourself to OLS regression, for example.

OLS regression isn't the only tool, but it is the most standard one to fit a functional form. One can use other kinds of regressions. My focus was on techniques that can use existing forecast estimates in a black-box fashion rather than those that require one to create new models of one's own on the evolution of the relevant processes.

but it is the most standard one to fit a functional form

It is the most simple one and probably the most widely used, though often inappropriately.

My focus was on techniques that can use existing forecast estimates in a black-box fashion rather than those that require one to create new models of one's own on the evolution of the relevant processes.

As soon as you do something with "existing forecast estimates" other than just accepting them, you are creating a new model of your own. You want to correct them for bias? That's a model you've created.

If you use external forecasts as data, as inputs, you are using them in "black-box fashion".