Yes, I see - it seems like there are two ways to do this exercise.
1) Everybody writes their own predictions and arranges them into probability bins (either artificially after coming up with them, or just writing 5 at 60%, 5 at 70%, etc.) You then check your calibration with a graph like Scott Alexander's.
2) Everybody writes their estimations for the same set of predictions - maybe you generate 50 as a group, and everyone writes down their most likely outcome and how confident they are in it. You then check your Brier score.
Both of these seem useful for different things - in 2), it's a sort of raw measure of how good at making accurate guesses you are. Lower confidence levels make your score worse. In 1), you're looking at calibration across probabilities - there are always going to be things you're only 50% or 70% sure about, and making those intervals reflect reality is as important as things you're 95% certain on.
I will edit the original post (in a bit) to reflect this.
Right, the two measures are calibration and accuracy. But calibration is part of accuracy.
Lower confidence levels make your score worse
Only if you guessed right. If you guessed wrong, lower confidence makes your score better. Under a "proper" scoring rule like Brier, you get the best possible score by honestly describing your uncertainty. Thus calibration — whether your 70% really happens 70% of the time — is a component of Brier score. If you improve your calibration, your Brier score will improve.
I think one should work on calibration befo...
TL;DR: Prediction & calibration parties are an exciting way for your EA/rationality/LessWrong group to practice rationality skills and celebrate the new year.
On December 30th, Seattle Rationality had a prediction party. Around 15 people showed up, brought snacks, brewed coffee, and spent several hours making predictions for 2017, and generating confidence levels for those predictions.
This was heavily inspired by Scott Alexander’s yearly predictions. (2014 results, 2015 results, 2016 predictions.) Our move was to turn this into a communal activity, with a few alterations to meet our needs and make it work better in a group.
Procedure:
To make this work in a group, we recommend the following:
This makes a good activity for rationality/EA groups for the following reasons:
Some examples of the predictions people used:
Also relevant: