MTGandP comments on 2014 Survey Results - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (279)
Calibration Score
Using a log scoring rule, I calculated a total accuracy+calibration score for the ten questions together. There's an issue that this assumes the questions are binary when they're not- someone who is 0% sure that Thor is the right answer to the mythology question gets the same score (0) as the person who is 100% sure that Odin is the right answer to the mythology question. I ignored infinitely low scores for the correlation part.
I replicated the MWI correlation, but I noticed something weird- all of the really low scorers gave really low probabilities to MWI. The worst scorer had a score of -18, which corresponds to giving about 1.6% probability to the right answer. What appears to have happened is they misunderstood the survey, and answered in fractions instead of percents- they got 9 out of 10 questions right, but lost 2 points every time they assigned 1% or slightly less than 1% to the right answer (i.e. they mean to express almost certainty by saying 1 or 0.99) and only lost 0.0013 points when they assigned 0.3% probability to a wrong answer.
When I drop the 30 lowest scorers, the direction of the relationship flips- now, people with better log scores (i.e. closer to 0) give lower probabilities for MWI (with a text answer counting as a probability of 0, as most were complaints that asking for a number didn't make sense).
What about Tragic Mistakes? These are people that assign 0% probability to a correct answer, or 100% probability to a wrong one, and under a log scoring rule lose infinite points. Checking those showed both errors, as well as highlighting that several of the 'wrong' answers were spelling mistakes- I probably would have accepted "Oden" and "Mitocondria."
(Amusingly, the person with the most tragic mistakes- 9 of them- supplied a probability for their answers instead of an answer, so they were 100% sure that the battle of Trafalgar was fought off the coast of 100, which was the state where Obama was born.)
There's a tiny decline in tragic mistakes as P(MWI) increases, but I don't think I'd be confident in drawing conclusions from this data.
Sort-of related question: How do you compute calibration scores?
I was using a logarithmic scoring rule, with a base of 10. (What base you use doesn't really matter.) The Excel formula for the first question (I'm pretty sure I didn't delete any columns, so it should line up) was: