Nanashi comments on Calibration Test with database of 150,000+ questions - LessWrong

37 Post author: Nanashi 14 March 2015 11:22AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (31)

You are viewing a single comment's thread. Show more comments above.

Comment author: Nanashi 13 March 2015 06:40:00PM 1 point [-]

That's a fair criticism, but if we're going down this road we've also gotta recognize the limitations of a multiple choice calibration test. Both styles suffer from the "You know it or you don't" dichotomy. If these questions were all multiple choice, you'd still have the same rectangular shaped graph, it would just start at 50% (assuming a binary choice) instead of 0%.

The big difference is the solution sets that the different styles represent. There are plenty of situations in life where there are a few specific courses of action to choose from. But there are also plenty of situations where that's not the case.

But, I will say that a multiple choice test definitely yields a "pretty" calibration curve much faster than an open-ended test. You've got a smaller range of values, and the nature of it lets you more confidently rule out one answer or the other. So the curve will be smoother faster. Whereas this will be pretty bottom heavy for a while.

Comment author: Sarunas 14 March 2015 12:03:02AM *  1 point [-]

I think the problem here is with many trivia questions you either know the answer or you don't

That means that for those questions most probabilities are either close to 0 or close to 1. This suggests that given this set of questions it would probably be a good idea to increase "resolution" near those two points. For that purpose, perhaps instead of asking for confidence levels expressed as percentages you could ask for confidence levels expressed as odds or log odds. For example, users could express their confidence levels using odds expressed as ratios 2^n:1, for n=k,...,0,...,-k.

Comment author: Nanashi 14 March 2015 11:28:51AM 1 point [-]

That's an interesting thought but I do suspect that you'd have to answer a lot of questions to see any difference whatsoever. If you're perfectly calibrated and answer 100 questions that you are either 99.99% confident or 99.9% confident, there's a very good chance that you'll get all 100 questions right, regardless of which confidence level you pick.