Forecasters vary on at least three dimensions:
- accuracy- as measured in (e.g.) average brier score over time (brier score is a measure of error where if you think (say) p is 0.7 likely and p turns out to be true, then your brier score on this forecast is (1 - 0.7)^2).
- calibration - how close are they to perfect calibration where for any x, if they assign a probability of x% to a given statement, in x% of cases, they are right?
- reliability - how much evidence does a given forecast of yours provide for the proposition in question being true? I think of this as "for a given confidence level c, whats the bayesfactor P(you say the probability of x is c|x)/P(you say the probability of x is c|not-x)?"
I wonder how these three properties relate to each other.
(A) Assume that you are perfectly calibrated at 90% and you say "It will rain today with 90% probability" - how should I update on your claim given I know your perfect calibration? My first intuition is that, given your perfect calibration,
P(you say rain with 90%|rain) is 90% and P(you say rain with 90%| no rain) is 10% likely. But that doesn't follow from the fact that you are perfectly calibrated, does it? Does your calibration have any bearing at all on your reliability (apart from the fact that both positively correlate with forecasting competence)? If it doesn't - why do we care about being calibrated?
(B) How does accuracy relate to reliability? Can infer something about your reliability from knowing your over-time brier score?
I am extremely interested in these sorts of questions myself (message me if you would want to chat more about them). In terms of the relation between accuracy and calibration, I think you might be able to see some of this relation from Open Philanthropy's report on the quality of their predictions. In footnote 10, I believe they decompose Brier score into a term for miscalibration, a term for resolution, and a term for entropy.
Also, would you be able to explain a bit how it would be possible for someone who is perfectly calibrated at predicting rain to predict rain at 90% probability but the Bayes factor based on that information to not by 9? To me it seems like for someone to be perfectly calibrated at the 90% confidence level the ratio of it having rained to it not having rained whenever they predict 90% rain has to be 9:1 so P(say rain 90% | rain) = 90% and P(say rain 90% | no rain)=10%?
Hey, thanks for the answer and sorry for my very late response. In particular thanks for the link to the OpenPhil report, very interesting! To your question - I now changed my mind again and tentatively think that you are right. Here's how I think about it now, but I still feel unsure whether I made a reasoning error somewhere:
There's some distribution of your probabilistic judgments that shows how frequently you report a given probability in a proposition that turned out to be true. It might show e.g. that for true propositions you report 90% probability ... (read more)