Or am I missing some key factor here? Did I misinterpret the lesson?
The key factor is that the 60,20 box is not in isolation - it is the top box, and so not only do you expect it to have more "signal" (gold) than average, you also expect it to have more noise than average.
You can think of the numbers on the boxes as drawn from a probability distribution. If there was 0 noise, this probability distribution would just be how the gold in the boxes was distributed. But if you add noise, it's like adding two probability distributions together. If you're not familiar with what happens, go look it up on wikipedia, but the upshot is that the combined distribution is more spread out than the original. This combined distribution isn't just noise or just signal, it's the probability of having some number be written on the outside of the box.
And so if something is the top, very highest box, where should it be located on the combined distribution?
Now, if you have something that's high on the combined distribution, how much of that is due to signal, and how much of it is due to noise? This is a tougher question, but the essential insight is that the noise shouldn't be more improbable than the signal, or vice versa - that is, they should both be about the same number of standard deviations from their means.
This means that if the standard deviation of the noise is bigger, then the probable contribution of the noise is greater.
Me saying the same thing a different way can be found here.
Oh, I understand now. Even if we don't know how it's distributed, if it's the top among 9 choices with the same variance that puts it in the 80th percentile for specialness, and signal and noise contribute to that equally. So it's likely to be in the 80th percentile of noise.
It might have been clearer if you'd instead made the boxes actually contain coins normally distributed about 40 with variance 15 and B=30, and made an alternative of 50/1, since you'd have been holding yourself to more proper unbiased generation of the numbers and still, in all likelih...
The best laid schemes of mice and men
Go often askew,
And leave us nothing but grief and pain,
For promised joy!
- Robert Burns (translated)
Consider the following question:
Or, suppose Holden Karnofsky of charity-evaluator GiveWell has been presented with a complex analysis of why an intervention that reduces existential risks from artificial intelligence has astronomical expected value and is therefore the type of intervention that should receive marginal philanthropic dollars. Holden feels skeptical about this 'explicit estimated expected value' approach; is his skepticism justified?
Suppose you're a business executive considering n alternatives whose 'true' expected values are μ1, ..., μn. By 'true' expected value I mean the expected value you would calculate if you could devote unlimited time, money, and computational resources to making the expected value calculation.2 But you only have three months and $50,000 with which to produce the estimate, and this limited study produces estimated expected values for the alternatives V1, ..., Vn.
Of course, you choose the alternative i* that has the highest estimated expected value Vi*. You implement the chosen alternative, and get the realized value xi*.
Let's call the difference xi* - Vi* the 'postdecision surprise'.3 A positive surprise means your option brought about more value than your analysis predicted; a negative surprise means you were disappointed.
Assume, too kindly, that your estimates are unbiased. And suppose you use this decision procedure many times, for many different decisions, and your estimates are unbiased. It seems reasonable to expect that on average you will receive the estimated expected value of each decision you make in this way. Sometimes you'll be positively surprised, sometimes negatively surprised, but on average you should get the estimated expected value for each decision.
Alas, this is not so; your outcome will usually be worse than what you predicted, even if your estimate was unbiased!
Why?
This is "the optimizer's curse." See Smith & Winkler (2006) for the proof.
The Solution
The solution to the optimizer's curse is rather straightforward.
To return to our original question: Yes, some skepticism is justified when considering the option before you with the highest expected value. To minimize your prediction error, treat the results of your decision analysis as uncertain and use Bayes' Theorem to combine its results with an appropriate prior.
Notes
1 Smith & Winkler (2006).
2 Lindley et al. (1979) and Lindley (1986) talk about 'true' expected values in this way.
3 Following Harrison & March (1984).
4 Quote and (adapted) image from Russell & Norvig (2009), pp. 618-619.
5 Smith & Winkler (2006).
References
Harrison & March (1984). Decision making and postdecision surprises. Administrative Science Quarterly, 29: 26–42.
Lindley, Tversky, & Brown. 1979. On the reconciliation of probability assessments. Journal of the Royal Statistical Society, Series A, 142: 146–180.
Lindley (1986). The reconciliation of decision analyses. Operations Research, 34: 289–295.
Russell & Norvig (2009). Artificial Intelligence: A Modern Approach, Third Edition. Prentice Hall.
Smith & Winkler (2006). The optimizer's curse: Skepticism and postdecision surprise in decision analysis. Management Science, 52: 311-322.