I recently spent a while looking at how different people had designed their probability calibration exercises (for ideas on how to design my own), and they turned out to be quite difficult to find. Many of the best ones were the least advertised and hardest to locate online. I figured I'd compile them all here in case anyone else finds themselves in a similar position. Please let me know about any I missed and I'll add them to the post. Many of these are old and no longer maintained, so no guarantees as to quality.
http://acritch.com/credence-game/
http://confidence.success-equation.com/
https://calibration-practice.neocities.org/
http://web.archive.org/web/20100529074053/http://www.acceleratingfuture.com/tom/?p=129
http://credencecalibration.com/
https://programs.clearerthinking.org/calibrate_your_judgment.html
https://www.openphilanthropy.org/calibration or https://80000hours.org/calibration-training/ (Different URLs for same application.)
https://calibration.lazdini.lv/
http://web.archive.org/web/20161020032514/http://calibratedprobabilityassessment.org/
https://predictionbook.com/credence_games/try
https://calibration-training.netlify.app/
https://play.google.com/store/apps/details?id=com.the_calibration_game
https://www.metaculus.com/tutorials/
https://outsidetheasylum.blog/probability-calibration/
Metaculus has a calibration tutorial too: https://www.metaculus.com/tutorials/
I've been thinking about adding a calibration exercise to https://manifold.markets as well, so I'm curious: what makes one particular set of calibration exercises more valuable than another? Better UI? Interesting questions? Legible or shareable results?
Questions about a topic that I don't know about result in me just putting the max entropy distribution on that question, which is fine if it's rare, but leads to unhelpful results if they make up a large proportion of all the questions. Most calibration tests I found pulled from generic trivia categories such as sports, politics, celebrities, science, and geography. I didn't find many that were domain-specific, so that might be a good area to focus on.
Some of them don't tell me what the right answers are at the end, or even which questions I got wrong, whi... (read more)