I personally like this two player calibration game, which I was introduced to by Paul Christiano at a meetup a couple of years ago:
There's no need to choose a minimum width confidence interval (is there a technical term for that?) e.g. "before 1920" would be an acceptable confidence interval for the question given above.
The big advantage of 50% confidence intervals over 90% confidence intervals (other than that they make a nice easy structure for the game) is that you get much faster feedback. 20 trials can meaningfully tell you that your 50% confidence intervals are off in one direction or the other. 20 trials is enough to tell you if you're overconfident, but it can't tell you if you're underconfident.
The big disadvantage is that 50% confidence intervals somehow don't feel as useful as 90% confidence intervals. I'm not sure this is really true, as there's nothing special about 90% (by my reckoning 50% is about as far away from 90% as 90% is from 98%), but it feels true. Of course, it's pretty trivial to change the game so it works with intervals other than 50%, but you have to play longer, and it gets more complicated.
In the book "How to Measure Anything" D. Hubbard presents a step-by-step method for calibrating your confidence intervals, which he has tested on hundreds of people, showing that it can make 90% of people almost perfect estimators within half a day of training.
I've been told that the Less Wrong and CFAR community is mostly not aware of this work, so given the importance of making good estimates to rationality, I thought it would be of interest.
(although note CFAR has developed its own games for training confidence interval calibration)
The main techniques to employ are:
To train yourself, practice making estimates repeatedly while using these techniques, until you reach 100% accuracy.
To read more and try sample questions, read the article we prepared on 80,000 Hours here.