Summary: A game of trivia where you answer factual questions about the world, but stating how sure you are that you’re right and trying to be well calibrated.
Tags: Large, Repeatable
Purpose: Calibration Trivia is designed to practice proper calibration – recognizing when you're very sure of something vs when you aren't very sure of it.
Materials: Minimally, you need a list of trivia questions and some writing implements for your audience. If you and your audience have smartphones, I suggest making use of Fatebook.io (if you're using true/false questions) or a google form (if you have a spreadsheet set up to score it.) In both cases, a timer can be useful to time each question, though it's perfectly acceptable to just advance to the next question after what feels like a couple of minutes or when it looks like most people are done.
Announcement: We’re planning to host a trivia game with a twist! If you’ve never been to a trivia night before, one the person running it will call out questions, we'll write our answers, and a good time is had by all. In addition to answering the question however, you'll be able to write down how confident you are in your guess and at the end we check if you're well calibrated – that is, do you know when you do and do not know the answer? Categories are Literature, Math and Science, History, Sports, and Tabletop Roleplaying Games.
Please bring a smartphone or similar device, as you'll need it to enter your answers!
Note: You should make sure to change the categories to match whatever you're using. You should also remove the smartphone line if you're using another method, such as having people write down their answers and hand them to you.
Description:
1. Describe the following rules to the participants.
"This is a game of trivia, with a special tweak. For anyone unfamiliar, the way trivia works is that I'll present a question, and you'll have a couple of minutes to write down an answer. Then I'll reveal the answer, and if you got it right then you'll get one point. Feel free to chat with each other once you're done guessing and while you're waiting for the next question."
"The tweak is, in addition to writing your answer down, you will also write down how confident you are that your answer is correct in the form of a percentage. If you are very confident, you might write 95, which means if you were this sure about twenty questions you'd expect to only get one of them wrong. If you were guessing wildly, you might write down 1, which means if you were that uncertain about a hundred things, you think you'd get one of them right mostly by coincidence. You'll be scored on calibration according to what's called a Brier Score, which is a Strictly Proper Scoring Rule for predictions – that means that you want to give your actual estimation of how likely you are to be right. You'll do generally do worse if you try and answer higher or lower than your actual estimation. Does anyone have any questions?"
Note: The scoring mechanism suggested is (1-their probability)^2 if they're right, and (0-their probability)^2 if they're wrong. Average the scores from each question together. Someone who correctly answered with a 90% confidence gets scored (1-.9)^2=.01. The best theoretical Brier Score would be 0, which is impossible to achieve but one can try and get close.
2. One at a time, read each question aloud. (A collection of questions is included below, under "Calibration Trivia Questions.") Be sure to speak clearly and loudly enough for everyone to hear. If you happen to have a projector or screen, it can help to put the question up there as well.
Every six questions, announce or display the current points and scores. If you have a very large crowd, it can speed things up to only announce the top five for Correct Answers and the top five for Best Calibrated. In both cases, I suggest it's more fun to announce from the bottom up, starting with the worst scorer and ending with the best.
Repeat until the entire set of questions has been worked through.
3. Announce the final points and scores.
Notes: You'll want a venue where you can talk loud enough for everyone to hear you. You may also want to adjust the question list or the number of questions based on how the interests of your group or how long you wish the event to run for.
Calibration Trivia Questions: Calibration Trivia Sets, example scoresheet 1
Variations: Brier scores are used to judge between two binary options, correct or incorrect. Here, I'm abusing it a bit having people write their answer from all the possibilities, then guess if they're right or wrong. The easy patch is to make all the questions in the form of statements, and then ask if those statements are true or false. In the Calibration Trivia Sets, any set marked TF is in the form of statements which are either true or false, meaning people just need to answer with their confidence in its truth.
(If you're using Fatebook, I suggest setting up the questions in a tournament, clicking the option to hide other people's answers, and making the question titles just "Trivia Question 1" and so on then displaying the text of the question on a projector.)
Another variation in how to write question sets is to make all questions have a numerical answer, and then ask for a range. You can score this by having the narrowest correct range win, or ask for 90% confidence intervals and see how often people are right.
Outrangeous and Breaking Rank are trivia games in their own right, not just variations of Calibration trivia. That said if you want something like Calibration Trivia but different, or you want a format where you don't need to have a set of questions prepared in advance, give them a try!
Notes: I advise giving several minutes for each question, longer than is needed to just write down the answer. Some people will spend more time thinking than you expect. Often people who have finished writing their answer will talk and socialize with each other in the gaps.
There's actually a big problem with using Brier scores for open-ended questions like this, which is that the optimal option if you're, say, 50% confident you have the right answer, is to instead report "Don't know / bleeblabloo, probability 0.0001". Then you get a good Brier score for knowing you would be wrong.
We ran this at our meetup today and it was the subject of much discussion. A big conclusion seemed to be that Brier scores work best when there is a fixed, limited number of possibilities to guess from; when the number of possibilities is large/unknown and you can guess "I don't know," you get this bad behavior.
We came up with a kind of hacky solution that gave you negative points for wrong answers and positive points for right ones, scaled to the probability you gave, plus regular Brier scores for the True/False questions. It's unlikely that solution was a proper scoring rule, but it was somewhat better in removing the incentive to always guess "[wrong answer] with probability epsilon."
The quick hack I'd use if I didn't want people to be able to easily guess wrong with high certainty would be to use True/False or multiple choice questions. That said, I don't currently think of this as a big problem?
There are two scores; Calibration and Correct Answers. If someone has remarkably good calibration and almost no correct answers, then they're probably deliberately guessing outlandish answers and being sure that they're wrong. That's not worth bragging rights, it's the equivalent of running to the side of the obstacles on an obstacle course. S... (read more)