But all you've done after "adjusting" the expected value estimates was producing a new batch of expected value estimates, which just shows that the original expected value estimates were not done very carefully (if there was an improvement), or that you face the same problem all over again...
Am I missing something?
In statistics the solution you describe is called Hierarchical or Multilevel Modeling. You assume that you data is drawn from a set of distributions which have their parameters drawn from another distribution. This automatically shrinks your estimates of the distributions towards the mean. I think it's a pretty useful trick to know and I think it would be good to do a writeup but I think you might need to have a decent grasp of bayesian statistics first.
The central point of the optimizer's curse not one I have seen before and is a very interesting point.
The solution however leaves me feeling slightly unhappy. It isn't obvious to me what prior one should use in this sort of context. I suspect that a rough estimate by simply using the rule of thumb that the more complicated a logical chain the more likely there is a problem in it might do similar work at a weaker level.
Have you tried to apply this sort of reasoning explicitly to various existential risk considerations? If so, what did you get?
The central point of the optimizer's curse not one I have seen before and is a very interesting point.
Reminds me of the winner's curse in auctions - the selected bid is the one that is the highest and so most likely to be due to overconfidence/bias.
Am I missing something, or does the post just say that we shouldn't use frequentist "unbiased estimators" as if they were Bayesian posterior expected values?
consider a decision problem in which there are k choices, each of which has true estimated [expected value] of 0.
Lukeprog, if I've understood you correctly, then this is no good; this is a corner case. The question to be answered here is whether we should expect a "common sense" executive who favors plans with a high prior estimate to do better than a "technical" analyst who favors plans that perform well according to the formal estimation criteria. By assuming that all prior estimates are identical except for bias, this assumption ensures that the technical analyst will win. This, however, begs the question. One could just as easily assume that there is large variation in the true expected values, and that the formal criteria will always produce an estimate of 0, in which case the common sense executive will always win.
Am I missing something? I like the topic; I would enjoy reading about which approach we should expect to perform better in a typical situation.
I think the case where all the choices has a "true expected value" of 0 is picked out merely to illustrate the problem.
Is there an example where applying this correction to the expected values changes the decision?
In any group there's going to be random noise, and if you choose an extreme value, chances are that value was inflated by noise. In Bayesian, given that something has the highest value, it probably had positive noise, not just positive signal. So the correction is to correct out the expected positive noise you get from explicitly choosing the highest value. Naturally, this correction is greater for when the noise is bigger.
So imagine choosing between black boxes. Each black box has some number of gold coins in it, and also two numbers written on it. The first number, A, on the box is like the estimated expected value, and the second number, B, is like the variance. What happened is that someone rolled two distinct dice with B sides, subtracted die 1 from die 2, and added that to the number of gold coins in the box.
So if you see a box with 40, 3 written on it, you know that it has an expected value of 40 gold coins, but might have as few as 37 or as many as 43.
Now comes the problem: I put 10 boxes in front of you, and tell you to choose the one with the most gold coins. The first box is 50, 1 - a very low-variance box. But the last 9 boxes are all high-uncertainty, all with ...
Sometimes contractors run out of money before finishing and you have to pay more or they leave you with a half-finished project :(
I'm not sure how exactly this differs from the GiveWell blog post along the same lines? You seem to both be dealing with roughly the same problem (decision making under uncertainty), and reach the same conclusion (pay attention to the standard deviation, use Bayesian updates)
I did find your graph in the middle a rather useful illustration, but otherwise don't feel like I've come away with anything really new...
This is interesting, but I don't see how to apply the solution. Presumably I either have no priors; or the priors are going to be generated by the same process I use to generate the values I am combining them with.
The resulting bias should be smaller if you choose the top 2 or 3 alternatives. E.g., give to 3 charities, not to 1.
How do market traders deal with this problem?
If I understand this correctly, there's an empirical problem.
How optimistic your most optimistic estimate is going to be is going to be a matter of temperament and knowledge for individuals, and group culture for groups. It seems to me that the correction would need to be determined by experience. Or is this the "appropriate prior" problem?
When I'd only seen the title for this article, I thought it was going to be about the question of how much effort you should put into optimizing.
This is nit-picky, but I don't think you should attribute to Robert Burns anything other than the words he actually wrote. Meanings change a lot in translation, and it's not quite fair to do that through invisible sleight of hand. "Robert Burns (standard English translation)" would serve to CYA.
The original lines:
The best laid schemes o' Mice an' Men,
Gang aft agley,
An' lea'e us nought but grief an' pain,
For promis'd joy!
are little different than the version Luke quoted, and are mostly understandable (with the exception "gang aft agley") to a sophisticated English reader with no special knowledge. I am somewhat inclined to call that version a rewrite rather than a translation, just as I would consider some modernized versions of Shakespeare to not be translations, but rewrites.
The standard problem of drawing lines in a continuum rears its head again. There are some reasonable arguments for calling Scots from this time a dialect of English, and many others for calling it a separate language. This is complicated by people's personal and national identities being involved. Questions like these generally end up being settled more by politics than by details of the different linguistic varieties involved.
You are thereby signalling that not only do YOU read Scots Gaelic (fluently, of course), but you expect everyone you come into contact with socially to ALSO be fluent in Scots Gaelic.
Scots Gaelic is not Scots (is not Scottish English, though modern speakers of Scots do generally code switch into it with ease, sometimes in a continuous way). Scots Gaelic is a Gaelic, Celtic language. Scots is Germanic. Burns wrote in Scots.
Scots Gaelic is a thing, but it is not the language in which Burns wrote. That's just called Scots. I wouldn't ordinarily have mentioned it, but... you're coming off as a bit snobby here. (O wad some Power the giftie gie us, am I right?)
that the high-status thing to do is to provide quotes in the original language without translation
This may be high status in certain social circles (having interacted with the snooty Ivy League educated New York poets also, they certainly think so) but to a lot of people doing so comes across as obnoxious and pretentious, that is an attempt to blatantly signal high status in a way that signals low status.
The highest status thing to do (and just optimal as far as I can tell for actually conveying information) is to include the original and the translation also.
I find it interesting that everyone here is focusing on status; couldn't it just be that crediting translations is absolutely necessary for the basic scholarly purpose of judging the authority and trustworthiness of the translation and even the original text? And that failing to provide attribution demonstrates a lack of academic expertise, general ignorance of the slipperiness of translation ('hey, how important could it be?'), and other such problems.
I know I find such information indispensable for my anime Evangelion research (I treat translations coming from ADV very differently from translations by Olivier Hague and that different from translations by Bochan_bird, and so on, to give a few examples), so how much more so for real scholarship?
Well, what I originally [see edit] wrote was "It's wrong (deprives the translator of rightful credit) -- and, FWIW, it's also low-status." I think people found the "low-status" part of my claim more interesting, but it wasn't the primary reason I reacted badly to seeing a translation uncredited as such.
Edit: on reflection, this wasn't my original justification. I simply reacted with gut-level intuition, knowing it was wrong. Every other explanation is after-the-fact, and therefore suspect.
Note Carl Shulman's counterargument to the assumption of a normal prior here and the comments traded between Holden and Carl.
"If your prior was that charity cost-effectiveness levels were normally distributed, then no conceivable evidence could convince you that a charity could be 100x as good as the 90th percentile charity. The probability of systematic error or hoax would always be ludicrously larger than the chance of such an effective charity. One could not believe, even in hindsight, that paying for Norman Borlaug’s team to work on the Green Revo...
quick feedback or question.
In this part: Assume, too kindly, that your estimates are unbiased. And suppose you use this decision procedure many times, for many different decisions, and your estimates are unbiased.
the second time you mention the unbiased makes no sense to me and looks like a typo.
If X = Skill + Luck, with Skill and Luck both random variables, then selecting max(X) will get you something that has high Skill and high Luck.
If Estimate = TrueVal + Error, then max(Estimate) will have both high TrueVal and high Error.
This obvious insight has many applications, especially when the selection is done over a very large number of entities, e.g. trying to emulate the habits of billionaires in order to become rich.
Very interesting. I'm going to try my hand at a short summary:
Assume that you have a number of different options you can choose, that you want to estimate the value of each option and you have to make your best guess as to which option is most valuable. In step one, you generate individual estimates using whatever procedure you think is best. In step 2 you make the final decision, by choosing the option that had the highest estimate in step one.
The point is: even if you have unbiased procedures for creating the individual estimates in step one (ie procedur...
The best laid schemes of mice and men
Go often askew,
And leave us nothing but grief and pain,
For promised joy!
- Robert Burns (translated)
Consider the following question:
Or, suppose Holden Karnofsky of charity-evaluator GiveWell has been presented with a complex analysis of why an intervention that reduces existential risks from artificial intelligence has astronomical expected value and is therefore the type of intervention that should receive marginal philanthropic dollars. Holden feels skeptical about this 'explicit estimated expected value' approach; is his skepticism justified?
Suppose you're a business executive considering n alternatives whose 'true' expected values are μ1, ..., μn. By 'true' expected value I mean the expected value you would calculate if you could devote unlimited time, money, and computational resources to making the expected value calculation.2 But you only have three months and $50,000 with which to produce the estimate, and this limited study produces estimated expected values for the alternatives V1, ..., Vn.
Of course, you choose the alternative i* that has the highest estimated expected value Vi*. You implement the chosen alternative, and get the realized value xi*.
Let's call the difference xi* - Vi* the 'postdecision surprise'.3 A positive surprise means your option brought about more value than your analysis predicted; a negative surprise means you were disappointed.
Assume, too kindly, that your estimates are unbiased. And suppose you use this decision procedure many times, for many different decisions, and your estimates are unbiased. It seems reasonable to expect that on average you will receive the estimated expected value of each decision you make in this way. Sometimes you'll be positively surprised, sometimes negatively surprised, but on average you should get the estimated expected value for each decision.
Alas, this is not so; your outcome will usually be worse than what you predicted, even if your estimate was unbiased!
Why?
This is "the optimizer's curse." See Smith & Winkler (2006) for the proof.
The Solution
The solution to the optimizer's curse is rather straightforward.
To return to our original question: Yes, some skepticism is justified when considering the option before you with the highest expected value. To minimize your prediction error, treat the results of your decision analysis as uncertain and use Bayes' Theorem to combine its results with an appropriate prior.
Notes
1 Smith & Winkler (2006).
2 Lindley et al. (1979) and Lindley (1986) talk about 'true' expected values in this way.
3 Following Harrison & March (1984).
4 Quote and (adapted) image from Russell & Norvig (2009), pp. 618-619.
5 Smith & Winkler (2006).
References
Harrison & March (1984). Decision making and postdecision surprises. Administrative Science Quarterly, 29: 26–42.
Lindley, Tversky, & Brown. 1979. On the reconciliation of probability assessments. Journal of the Royal Statistical Society, Series A, 142: 146–180.
Lindley (1986). The reconciliation of decision analyses. Operations Research, 34: 289–295.
Russell & Norvig (2009). Artificial Intelligence: A Modern Approach, Third Edition. Prentice Hall.
Smith & Winkler (2006). The optimizer's curse: Skepticism and postdecision surprise in decision analysis. Management Science, 52: 311-322.