In A Technical Explanation of Technical Explanation, Eliezer writes,
You should only assign a calibrated confidence of 98% if you're confident enough that you think you could answer a hundred similar questions, of equal difficulty, one after the other, each independent from the others, and be wrong, on average, about twice. We'll keep track of how often you're right, over time, and if it turns out that when you say "90% sure" you're right about 7 times out of 10, then we'll say you're poorly calibrated.
...
What we mean by "probability" is that if you utter the words "two percent probability" on fifty independent occasions, it better not happen more than once
...
If you say "98% probable" a thousand times, and you are surprised only five times, we still ding you for poor calibration. You're allocating too much probability mass to the possibility that you're wrong. You should say "99.5% probable" to maximize your score. The scoring rule rewards accurate calibration, encouraging neither humility nor arrogance.
So I have a question. Is this not an endorsement of frequentism? I don't think I understand fully, but isn't counting the instances of the event exactly frequentist methodology? How could this be Bayesian?
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Good article on the abuse of p-values: http://www.sciencenews.org/view/feature/id/57091/title/Odds_are,_its_wrong