Hi all,
I put this calibration test together this morning. It pulls from a trivia API of over 150,000 questions so you should be able to take this many, many times before you start seeing repeats.
http://www.2pih.com/caltest.php
A few notes:
1. The questions are "Jeopardy" style questions so the wording may be strange, and some of them might be impossible to answer without further context. On these just assign 0% confidence.
2. As the questions are open-ended, there is no answer-checking mechanism. You have to be honest with yourself as to whether or not you got the right answer. Because what would be the point of cheating at a calibration test?
I can't think of anything else. Please let me know if there are any features you would want to see added, or if there are any bugs, issues, etc.
**EDIT**
As per suggestion I have moved this to the main section. Here are the changes I'll be making soon:
- Label the axes and include an explanation of calibration curves.
- Make it so you can reverse your last selection in the event of a misclick.
Here are changes I'll make eventually:
- Create an account system so you can store your results online.
- Move trivia DB over to my own server to allow for flagging of bad/unanswerable questions.
Here are the changes that are done:
- Change 0% to 0.1% and 99% to 99.9%
- Added second graph which shows the frequency of your confidence selections.
- Color code the "right" and "wrong" buttons and make them farther apart to prevent misclicks.
- Store your results locally so that you can see your calibration over time.
- Check to see if a question is blank and skip if so.
Thanks! BTW, I'd prefer to have 1% and 0.1% and 99% and 99.9% as options, rather than skipping over the 1% and 99% options as you have it now.
I considered that but I think at least for now it may just overcomplicate things for not a ton of benefit. Subjectively it seems that out of 100 questions, there are maybe 10 that I would assign the highest possible confidence. Of those I'd say only 1 out them would be questions that I'd pick 99% confidence if it were available instead of, say, 99.9%.
So assuming (incorrectly) that I'm perfectly calibrated it would take about 7000 questions in order to stand a >50% chance of seeing a meaningful difference between the two confidence levels.