mfb comments on Raising the forecasting waterline (part 1) - LessWrong

32 Post author: Morendil 09 October 2012 03:49PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (108)

You are viewing a single comment's thread. Show more comments above.

Comment author: mfb 13 October 2012 01:38:46PM 0 points [-]

To calculate the Brier score, you used >your< assumption that meteorites have a 1 in a million chance to hit a specfic area. What about events without a natural way to get those assumptions?

Let's use another example:

Assume that I predict that neither Obama nor Romney will be elected with 95% confidence. If that prediction becomes true, it is amazing and indicates a high predictive power (especially if I make multiple similar predictions and most of them become true).

Assume that I predict that either Obama or Romney will be elected with 95% confidence. If that prediction becomes true, it is not surprising.

Where is the difference? The second event is expected by others. How can we quantify "difference to expectations of others" and include it in the score? Maybe with an additional weight - weight each prediction with the difference from the expectations of others (as mean of the log ratio or something like that).

Comment author: Kindly 13 October 2012 03:31:40PM 0 points [-]

If the objective is to get better scores than others, then that helps, though it's not clear to me that it does so in any consistent way (in particular, the strategy to maximize your score and the strategy to get the best score with the highest probability may well be different, and one of them might involve mis-reporting your own degree of belief).

Comment author: Morendil 13 October 2012 02:45:08PM *  0 points [-]

How can we quantify "difference to expectations of others" and include it in the score?

You're getting this from the "refinement" part of the calibration/refinement decomposition of the Brier score. Over time, your score will end up much higher than others' if you have better refinement (e.g. from "inside information", or from a superior methodology), even if everyone is identically (perfectly) calibrated.

This is the difference between a weather forecast derived from looking at a climate model, e.g. I assign 68% probability to the proposition that the temperature today in your city is within one standard deviation of its average October temperature, and one derived from looking out the window.

ETA: what you say about my using an assumption is not correct - I've only been making the forecast well-specified, such that the way you said you allocated your probability mass would give us a proper loss function, and simplifying the calculation by using a uniform distribution for the rest of your 90%. You can compute the loss function for any allocation of probability among outcomes that you care to name - the math might become more complicated, is all. I'm not making any assumptions as to the probability distribution of the actual events. The math doesn't, either. It's quite general.

Comment author: mfb 14 October 2012 09:17:00PM 0 points [-]

I can still make 100000 lottery predictions, and get a good score. I look for a system which you cannot trick in that way. Ok, for each prediction, you can subtract the average score from your score. That should work. Assuming that all other predictions are rational, too, you get an expectation of 0 difference in the lottery predictions.

I've only been making the forecast well-specified

I think "impact here (10% confidence), no impact at that place (90% confidence)" is quite specific. It is a binary event.