Oscar_Cunningham comments on The Optimizer's Curse and How to Beat It - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (76)
Is there an example where applying this correction to the expected values changes the decision?
In any group there's going to be random noise, and if you choose an extreme value, chances are that value was inflated by noise. In Bayesian, given that something has the highest value, it probably had positive noise, not just positive signal. So the correction is to correct out the expected positive noise you get from explicitly choosing the highest value. Naturally, this correction is greater for when the noise is bigger.
So imagine choosing between black boxes. Each black box has some number of gold coins in it, and also two numbers written on it. The first number, A, on the box is like the estimated expected value, and the second number, B, is like the variance. What happened is that someone rolled two distinct dice with B sides, subtracted die 1 from die 2, and added that to the number of gold coins in the box.
So if you see a box with 40, 3 written on it, you know that it has an expected value of 40 gold coins, but might have as few as 37 or as many as 43.
Now comes the problem: I put 10 boxes in front of you, and tell you to choose the one with the most gold coins. The first box is 50, 1 - a very low-variance box. But the last 9 boxes are all high-uncertainty, all with B=20. The expected values printed on them are as follows [I generated the boxes honestly] : 53, 52, 37, 60, 44, 36, 56, 45, 54. Ooh, one of those boxes has a 60 on it! Pick that one!
Okay, don't pick that one. Think about it - there are 9 boxes with high variance, and the one you picked probably has unusually large noise. To be special among 9 proposals with high variance, it probably has noise at the 80th+ percentile. What's the 80th percentile of noise for 1d20 - 1d20? I bet it's larger than 10. You're better off just going with the 50, 1 box.
And it's a good thing you applied that correction, because I generated the boxes by typing "RandomInteger[20,9] - RandomInteger[20,9] + 45" into Wolfram alpha - they each 45 coins each.
So this illustrates that what beating the optimizer's curse really is is a sort of "correction for multiple comparisons." If you have a lot of noisy boxes, some of them will look large even when they're not, even larger than non-noisy boxes.
But if you don't know that all the high variance boxes have the same mean then 60 is the one to go with. And if you do know they have the same mean, then it's expected value is no longer 60.
Imagine putting gold coins into a bunch of boxes by having them normally distributed about 50 gold coins with standard deviation 10. Then we'll add some Gaussian noise to the estimates on the boxes - but we'll split them into 2 groups. Ten boxes will have noise with standard deviation of 5, while the other ten will have a standard deviation of 25.
But since I've still kept the simple situation where we just have 2 groups, you can get the overall biggest by just picking the biggest from each group and comparing them. So we can treat the groups independently for a bit. The biggest one is going to have the biggest positive deviation from 50, combined signal and noise. Because I used normal distributions this time, the combined prior+noise distribution is just a bigger normal distribution. So given that something is big or small by this combined distribution, how do we expect the signal and noise distributions to shift? Well, it would be silly to expect one of them to be more improbable than the other, so we expect their means to shift by about the same number of standard deviations for each distribution. This right there means that the bigger the noise, the more of the variation we should attribute to noise. And also the bigger the element in the combined distribution, the larger we should expect its noise to be.
But if you know the boxes were originally drawn from N(50,100) then the number on the box is no longer the correct Bayesian mean. All I'm arguing is that once you have your Bayesian expected value you don't need to update it any further.
That's pretty uncontroversial, but in practice it means that you end up penalizing high-noise boxes with high values (and boosting high-noise boxes with low values), which I think is a nontrivial result.