Great book. It was percolating around CFAR a few months back - I (Dan from CFAR) read it, several other people read at least part of the book or my notes on it, and we had some conversations about it. A few things from the book that stuck out to me (although some may have been slightly distorted by memory):
For a 90% CI there is a 10% chance that the answer lies outside your estimate, and if you split this there is a 5% chance that the answer is above your upper bound and a 5% chance that the answer is below your lower bound.
This isn't always true. For example, one calibration question I've done is, "How long are all 3 extended Lord of the Rings movies back to back?
On this, I was almost certain they were at least 3 hours long, but I wasn't sure how much more than that they were. So, my minimum was 9 hours. I was fairly confident they weren't more than 4 hours, so my upper was 12 (this was for a 70% interval). Almost all my uncertainty was on the upper end, while very little was on the lower.
FYI, I told the CFAR principals about How to Measure Anything, and specifically about the calibration exercises detailed in chapter 5, on September 9th of last year, at which time Anna said she had previously read the first half of the book.
But yeah, it hasn't been discussed on LW much, though it has been on my recommended books page for a long time.
Sorry Luke, I didn't want to bother you so didn't ask, but I should have guessed you would have found this :)
So, since basically everyone in the world is overconfident, you can make them better calibrated just by making them come up with an interval and then doubling it.
What I've never really got is how you become accurately calibrated at the long tails. Are there really people who can consistently give both 90% and 95% confidence intervals? To me those both just feel like "really likely", and the higher the granularity, the harder it gets - note that a 98% confidence interval should probably be twice as wide as a 95% confidence interval. Are there people who have truly internalised this?
I personally like this two player calibration game, which I was introduced to by Paul Christiano at a meetup a couple of years ago:
There's no need to choose a minimum width confidence interval (is there a technical term for that?) e.g. "before 1920" would be an acceptable confidence interval for the question given above.
The big advantage of 50% confidence intervals over 90% confidence intervals (other than that they make a nice easy structure for the game) is that you get much faster feedback. 20 trials can meaningfully tell you that your 50% confidence intervals are off in one direction or the other. 20 trials is enough to tell you if you're overconfident, but it can't tell you if you're underconfident.
The big disadvantage is that 50% confidence intervals somehow don't feel as useful as 90% confidence intervals. I'm not sure this is really true, as there's nothing special about 90% (by my reckoning 50% is about as far away from 90% as 90% is from 98%), but it feels true. Of course, it's pretty trivial to change the game so it works with intervals other than 50%, but you have to play longer, and it gets more complicated.
This is fantastic input. Thank you very much.
I am a little skeptical of the first technique of the wheel. I thought that was something I did naturally in any case. Of course, I do need to read the book to really figure out what's happening here though.
In the book "How to Measure Anything" D. Hubbard presents a step-by-step method for calibrating your confidence intervals, which he has tested on hundreds of people, showing that it can make 90% of people almost perfect estimators within half a day of training.
I've been told that the Less Wrong and CFAR community is mostly not aware of this work, so given the importance of making good estimates to rationality, I thought it would be of interest.
(although note CFAR has developed its own games for training confidence interval calibration)
The main techniques to employ are:
To train yourself, practice making estimates repeatedly while using these techniques, until you reach 100% accuracy.
To read more and try sample questions, read the article we prepared on 80,000 Hours here.