I think the problem here is with many trivia questions you either know the answer or you don't; the dominant factor in my results so far is that I either have no answer in mind, assign 0 probability to my being right and am correctly calibrated there, and then all of my answers at other levels of certainty have turned out right so far so my calibration curve looks almost rectangular.
I might just be getting accurate information that I'm drastically underconfident, but I think this might be one of the worse types of questions to calibrate on. I mean, even if the problem is just that I'm drastically underconfident on trivia questions and shouldn't be assigning less than 50% probability to any of my answers when I have an answer, that sounds sufficiently unrepresentative of most areas where you need calibration, and how most people perform on other calibration tests, for this to be a pretty bad measure of calibration.
Perhaps it would be better as a multiple choice test, so one can have possible answers raised to attention that may or may not be right, and assign probabilities to those?
My favorite calibration tools have been one where there was a numerical answer and you had to express a 50% confidence interval, or 90% confidence interval.
Like, a question would be how many stairs are there in the Statue of Liberty? And my 50% interval would be 400-1000, and my 90% interval would be 200-5000.
Looking up the answer it was 354, and I would mark my 50% as wrong and my 90% as right.
Hi all,
I put this calibration test together this morning. It pulls from a trivia API of over 150,000 questions so you should be able to take this many, many times before you start seeing repeats.
http://www.2pih.com/caltest.php
A few notes:
1. The questions are "Jeopardy" style questions so the wording may be strange, and some of them might be impossible to answer without further context. On these just assign 0% confidence.
2. As the questions are open-ended, there is no answer-checking mechanism. You have to be honest with yourself as to whether or not you got the right answer. Because what would be the point of cheating at a calibration test?
I can't think of anything else. Please let me know if there are any features you would want to see added, or if there are any bugs, issues, etc.
**EDIT**
As per suggestion I have moved this to the main section. Here are the changes I'll be making soon:
Here are changes I'll make eventually:
Here are the changes that are done: