# Matt_Simpson comments on The Optimizer's Curse and How to Beat It - Less Wrong

44 16 September 2011 02:46AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Sort By: Best

Comment author: 16 September 2011 04:01:16PM 0 points [-]

It won't have the same meaning, but nothing in the math prevents you from doing it and it might be more informative since it allows you to look at a single bias number instead of an uncountable set of biases (and Bayesian decision theory essentially does this). To be a little more explicit, the technical definition of bias is:

E[estimator|true value] - true value

And if we want to minimize bias, we try to do so over all possible values of the true values. But we can easily integrate over the space of the true value (assuming some prior over the true value) to achieve

E[ E[estimator|true value] - true value ] = E[ estimator - true value ]

This is similar to the Bayes risk of the estimator with respect to some prior distribution (the difference is that we don't have a loss function here). By analogy, I might call this "Bayes bias."

The only issue is that your estimator may be right on average but that doesn't mean it's going to be anywhere close to the true value. Usually bias is used along with the variance of the estimator (since MSE(estimator)=Variance(estimator) + [Bias(estimator)]^2 ), but we could just modify our definition of Bayes bias so that we only have to look at one number to take the absolute value of the difference - the numbers closer to zero mean better estimators. Then we're just calculating Bayes risk with respect to some prior and absolute error loss, i.e.

E[ | estimator - true value | ]

(Which is NOT in general equivalent to | E[estimator - true value] | by Jensen's inequality)