Hi all,
I put this calibration test together this morning. It pulls from a trivia API of over 150,000 questions so you should be able to take this many, many times before you start seeing repeats.
http://www.2pih.com/caltest.php
A few notes:
1. The questions are "Jeopardy" style questions so the wording may be strange, and some of them might be impossible to answer without further context. On these just assign 0% confidence.
2. As the questions are open-ended, there is no answer-checking mechanism. You have to be honest with yourself as to whether or not you got the right answer. Because what would be the point of cheating at a calibration test?
I can't think of anything else. Please let me know if there are any features you would want to see added, or if there are any bugs, issues, etc.
**EDIT**
As per suggestion I have moved this to the main section. Here are the changes I'll be making soon:
- Label the axes and include an explanation of calibration curves.
- Make it so you can reverse your last selection in the event of a misclick.
Here are changes I'll make eventually:
- Create an account system so you can store your results online.
- Move trivia DB over to my own server to allow for flagging of bad/unanswerable questions.
Here are the changes that are done:
- Change 0% to 0.1% and 99% to 99.9%
- Added second graph which shows the frequency of your confidence selections.
- Color code the "right" and "wrong" buttons and make them farther apart to prevent misclicks.
- Store your results locally so that you can see your calibration over time.
- Check to see if a question is blank and skip if so.
It's possible to be, to some extent, certain that you haven't thought of a correct answer (if not certain you don't know the answer), because you don't have any answer in mind and yet are not considering the answer "this is a trick question" or "there is no correct answer". Is this something that should be represented, making "0%" correct to include, or am I confused?
I got one blank question, which I think was an error with loading since the answer came up the same as the previous question, and the one after it took a couple seconds to appear on-screen.
I'd prefer not to allow 0 and 1 as available credences. But if 0 remained as an option I would just interpret it as "very close to 0" and then keep using the app, though if a future version of the app showed me my Bayes score then the difference between what the app allows me to choose (0%) and what I'm interpreting 0 to mean ("very close to 0") could matter.