In any group there's going to be random noise, and if you choose an extreme value, chances are that value was inflated by noise. In Bayesian, given that something has the highest value, it probably had positive noise, not just positive signal. So the correction is to correct out the expected positive noise you get from explicitly choosing the highest value. Naturally, this correction is greater for when the noise is bigger.
So imagine choosing between black boxes. Each black box has some number of gold coins in it, and also two numbers written on it. The first number, A, on the box is like the estimated expected value, and the second number, B, is like the variance. What happened is that someone rolled two distinct dice with B sides, subtracted die 1 from die 2, and added that to the number of gold coins in the box.
So if you see a box with 40, 3 written on it, you know that it has an expected value of 40 gold coins, but might have as few as 37 or as many as 43.
Now comes the problem: I put 10 boxes in front of you, and tell you to choose the one with the most gold coins. The first box is 50, 1 - a very low-variance box. But the last 9 boxes are all high-uncertainty, all with B=20. The expected values printed on them are as follows [I generated the boxes honestly] : 53, 52, 37, 60, 44, 36, 56, 45, 54. Ooh, one of those boxes has a 60 on it! Pick that one!
Okay, don't pick that one. Think about it - there are 9 boxes with high variance, and the one you picked probably has unusually large noise. To be special among 9 proposals with high variance, it probably has noise at the 80th+ percentile. What's the 80th percentile of noise for 1d20 - 1d20? I bet it's larger than 10. You're better off just going with the 50, 1 box.
And it's a good thing you applied that correction, because I generated the boxes by typing "RandomInteger[20,9] - RandomInteger[20,9] + 45" into Wolfram alpha - they each 45 coins each.
So this illustrates that what beating the optimizer's curse really is is a sort of "correction for multiple comparisons." If you have a lot of noisy boxes, some of them will look large even when they're not, even larger than non-noisy boxes.
I'm trying to figure out why, from the rules you gave at the start, we can assume that box 60 has more noise than the other boxes with variance of 20. You didn't, at the outset of the problem, say anything about what the values in the boxes actually were. I would not, taking this experiment, have been surprised to see a box labeled "200", with a variance of 20, because the rules didn't say anything about values being close to 50, just close to A. Well, I would've been surprised with you as a test-giver, but it wouldn't have violated what I unders...
The best laid schemes of mice and men
Go often askew,
And leave us nothing but grief and pain,
For promised joy!
- Robert Burns (translated)
Consider the following question:
Or, suppose Holden Karnofsky of charity-evaluator GiveWell has been presented with a complex analysis of why an intervention that reduces existential risks from artificial intelligence has astronomical expected value and is therefore the type of intervention that should receive marginal philanthropic dollars. Holden feels skeptical about this 'explicit estimated expected value' approach; is his skepticism justified?
Suppose you're a business executive considering n alternatives whose 'true' expected values are μ1, ..., μn. By 'true' expected value I mean the expected value you would calculate if you could devote unlimited time, money, and computational resources to making the expected value calculation.2 But you only have three months and $50,000 with which to produce the estimate, and this limited study produces estimated expected values for the alternatives V1, ..., Vn.
Of course, you choose the alternative i* that has the highest estimated expected value Vi*. You implement the chosen alternative, and get the realized value xi*.
Let's call the difference xi* - Vi* the 'postdecision surprise'.3 A positive surprise means your option brought about more value than your analysis predicted; a negative surprise means you were disappointed.
Assume, too kindly, that your estimates are unbiased. And suppose you use this decision procedure many times, for many different decisions, and your estimates are unbiased. It seems reasonable to expect that on average you will receive the estimated expected value of each decision you make in this way. Sometimes you'll be positively surprised, sometimes negatively surprised, but on average you should get the estimated expected value for each decision.
Alas, this is not so; your outcome will usually be worse than what you predicted, even if your estimate was unbiased!
Why?
This is "the optimizer's curse." See Smith & Winkler (2006) for the proof.
The Solution
The solution to the optimizer's curse is rather straightforward.
To return to our original question: Yes, some skepticism is justified when considering the option before you with the highest expected value. To minimize your prediction error, treat the results of your decision analysis as uncertain and use Bayes' Theorem to combine its results with an appropriate prior.
Notes
1 Smith & Winkler (2006).
2 Lindley et al. (1979) and Lindley (1986) talk about 'true' expected values in this way.
3 Following Harrison & March (1984).
4 Quote and (adapted) image from Russell & Norvig (2009), pp. 618-619.
5 Smith & Winkler (2006).
References
Harrison & March (1984). Decision making and postdecision surprises. Administrative Science Quarterly, 29: 26–42.
Lindley, Tversky, & Brown. 1979. On the reconciliation of probability assessments. Journal of the Royal Statistical Society, Series A, 142: 146–180.
Lindley (1986). The reconciliation of decision analyses. Operations Research, 34: 289–295.
Russell & Norvig (2009). Artificial Intelligence: A Modern Approach, Third Edition. Prentice Hall.
Smith & Winkler (2006). The optimizer's curse: Skepticism and postdecision surprise in decision analysis. Management Science, 52: 311-322.