Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Oscar_Cunningham comments on The Optimizer's Curse and How to Beat It - Less Wrong

44 Post author: lukeprog 16 September 2011 02:46AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (81)

You are viewing a single comment's thread. Show more comments above.

Comment author: Oscar_Cunningham 17 September 2011 11:25:11PM 0 points [-]

But if you don't know that all the high variance boxes have the same mean then 60 is the one to go with. And if you do know they have the same mean, then it's expected value is no longer 60.

Comment author: Manfred 18 September 2011 08:48:21AM 1 point [-]

Imagine putting gold coins into a bunch of boxes by having them normally distributed about 50 gold coins with standard deviation 10. Then we'll add some Gaussian noise to the estimates on the boxes - but we'll split them into 2 groups. Ten boxes will have noise with standard deviation of 5, while the other ten will have a standard deviation of 25.

But since I've still kept the simple situation where we just have 2 groups, you can get the overall biggest by just picking the biggest from each group and comparing them. So we can treat the groups independently for a bit. The biggest one is going to have the biggest positive deviation from 50, combined signal and noise. Because I used normal distributions this time, the combined prior+noise distribution is just a bigger normal distribution. So given that something is big or small by this combined distribution, how do we expect the signal and noise distributions to shift? Well, it would be silly to expect one of them to be more improbable than the other, so we expect their means to shift by about the same number of standard deviations for each distribution. This right there means that the bigger the noise, the more of the variation we should attribute to noise. And also the bigger the element in the combined distribution, the larger we should expect its noise to be.

Comment author: Oscar_Cunningham 18 September 2011 09:45:46AM 0 points [-]

But if you know the boxes were originally drawn from N(50,100) then the number on the box is no longer the correct Bayesian mean. All I'm arguing is that once you have your Bayesian expected value you don't need to update it any further.

Comment author: Manfred 18 September 2011 10:13:42AM *  3 points [-]

All I'm arguing is that once you have your Bayesian expected value you don't need to update it any further.

That's pretty uncontroversial, but in practice it means that you end up penalizing high-noise boxes with high values (and boosting high-noise boxes with low values), which I think is a nontrivial result.