Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

wnoise comments on The Optimizer's Curse and How to Beat It - Less Wrong

44 Post author: lukeprog 16 September 2011 02:46AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (81)

You are viewing a single comment's thread. Show more comments above.

Comment author: wnoise 16 September 2011 08:21:29AM 0 points [-]

Yes, the technical definition is E[estimate - parameter], but "unbiased" has an implicit "for all parameter values". You really can't stick a random variable there and have the same meaning that statisticians use. (That said, I don't see how DanielLC's reformulation makes sense.)

Comment author: Matt_Simpson 16 September 2011 04:01:16PM 0 points [-]

It won't have the same meaning, but nothing in the math prevents you from doing it and it might be more informative since it allows you to look at a single bias number instead of an uncountable set of biases (and Bayesian decision theory essentially does this). To be a little more explicit, the technical definition of bias is:

E[estimator|true value] - true value

And if we want to minimize bias, we try to do so over all possible values of the true values. But we can easily integrate over the space of the true value (assuming some prior over the true value) to achieve

E[ E[estimator|true value] - true value ] = E[ estimator - true value ]

This is similar to the Bayes risk of the estimator with respect to some prior distribution (the difference is that we don't have a loss function here). By analogy, I might call this "Bayes bias."

The only issue is that your estimator may be right on average but that doesn't mean it's going to be anywhere close to the true value. Usually bias is used along with the variance of the estimator (since MSE(estimator)=Variance(estimator) + [Bias(estimator)]^2 ), but we could just modify our definition of Bayes bias so that we only have to look at one number to take the absolute value of the difference - the numbers closer to zero mean better estimators. Then we're just calculating Bayes risk with respect to some prior and absolute error loss, i.e.

E[ | estimator - true value | ]

(Which is NOT in general equivalent to | E[estimator - true value] | by Jensen's inequality)