So, what do you think? Does this method seem at all promising? I'm debating with myself whether I should begin using SPIES on Metaculus or elsewhere.
I'm not super impressed tbh. I don't see "give a 90% confidence interval for x" as a question which comes up frequently? (At least in the context of eliciting forecasts and estimates from humans - it comes up quite a bit in data analysis).
For example, I don't really understand how you'd use it as a method on Metaculus. Metaculus has 2 question types - binary and continuous. For binary you have to give the probability an event happens - not sure how you'd use SPIES to help here. For continuous you are effectively doing the first step of SPIES - specifying the full distribution.
If I was to make a positive case for this, it would be - forcing people to give a full distribution results in better forecasts for sub-intervals. This seems an interesting (and plausible claim) but I don't find anything beyond that insight especially valuable.
You could get far more rapid feedback on the usefulness of this method by using it in a calibration training.
Nice, I didn't know OpenPhil had calibration training.
It is difficult to use SPIES for the calibration training - I kept running out of time when using my implementation in Python. To still compare the methods, I copied some questions and gave a confidence interval and SPIES estimate. Here are the results; I've only included 5 questions, but from what I've done, it seems SPIES helps me to narrow might 80% confidence intervals.
1. In which year was the US Open decided for the first time by 'sudden death'?
2. In what year did Emerson Fittipaldi first win the World Championship?
3. In what year was rayon first produced in the United States?
4. When was the first Winter Olympics held?
5. In which year did Frankie Goes to Hollywood form?
Here, I discuss the SPIES forecasting method, and ask for the community's thoughts on it.
Not too long ago, I came across the SPIES (Subjective Probability Interval Estimates) method for judgmental forecasting. The method was developed by Uriel Haran, and seems first to have been published as part of his 2011 dissertation Subjective Probability Interval Estimates: A Simple and Effective Way to Reduce Overprecision in Judgment. Haran writes
He eventually makes the claim that SPIES reduces the overprecision of confidence interval forecasts, and evidences this claim with the results of several forecasting experiments he conducted. Participants made forecasts using the following methods: provide a confidence interval that contains the target value 90% of the time; provide a 5% lower bound and 95% upper bound that they believe the target value would not be below and above, respectively; use the SPIES method, which consists of decomposing a numerical range into several intervals, then having participants assign a likelihood of 0-100 for each interval, then normalizing these likelihoods into a probability uniformly distributed over the values in the interval, and then finding the shortest subinterval of the numerical range that constitutes 90% of the cumulative probability.
For example, if we want to forecast the monthly rainfall for NYC in March, we can begin by looking at the following intervals 40-65mm, 66-90mm, 91-115mm, 116-140mm, and >140mm (40mm to 140+mm could have been partitioned into 5, 10, etc... intervals, I just chose 5 for this example). I do not know much about rainfall in NYC, but might assign these intervals the following likelihoods: 35/100, 55/100, 85/100, 25/100, and 5/100, respectively. My probabilities for these intervals would then be
The smallest subinterval of [40, >140] subsuming 90% of this probability produces the following estimate: With 90% confidence, I believe NYC's rainfall in March will be between 40mm and 125mm. Note that, to get this estimate, I had to use programming. Also, the 90% confidence interval was somewhat arbitrary; I'm also 75% confident that NYC's rainfall in March will be between 55mm and 115mm.
I haven't come across SPIES anywhere on LW, and first found out about it in this Harvard Business Review article A Simple Tool for Making Better Forecasts, which contains an interactive example (temperatures in June) of SPIES.
So, what do you think? Does this method seem at all promising? I'm debating with myself whether I should begin using SPIES on Metaculus or elsewhere. Would anyone be interested in performing some experiments with me on using SPIES in a greater variety of forecasting situations, or perhaps in improving SPIES or in building better methods to control for overconfident forecasts?