Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

JoshuaFox comments on Credence calibration game FAQ - Less Wrong Discussion

13 Post author: Academian 26 November 2012 12:52AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (55)

You are viewing a single comment's thread.

Comment author: JoshuaFox 26 November 2012 11:05:05AM 0 points [-]

What's a good result, both in terms of the number and the graph? What are other people's results? Not that I want to be too competitive, but I have no idea if I am doing very well or very badly.

Comment author: asparisi 26 November 2012 09:16:45PM 0 points [-]

High score seems to be good in terms of "My confident beliefs tend to be right."

Having your bars on the graph line up with the diagonal line would be an "ideal" graph (neither over- nor under- confident)

Comment author: JoshuaFox 27 November 2012 08:34:58AM *  0 points [-]

What is a high score? I realize that there is no absolute scale, but I have no idea if 10 is good or 1000 is bad.

Comment author: ChristianKl 03 December 2012 11:02:30PM *  1 point [-]

Out of 363 guesses my average score is 8. I'm from Germany so my knowledge of some of the US specific stuff isn't good.

Comment author: gjm 07 December 2012 11:46:12PM *  0 points [-]

One anecdata point: after 200 answers I have an average score of 12.6; I have success rates of 54% for "50%" answers, 59% for "60%" answers, 75% for "70%" answers, 88% for "80%" and "90%" answers (average 83%), and 100% for "99%" answers. (I've been very consistently underconfident.) I'm from the UK and for many of the sporting questions I couldn't even tell you what sport they're about. This feels to me like pretty good performance but I have little real basis for that opinion.

[EDITED to add: my counts are 33 @ 50%, 107 @ 60%, 40 @ 70%, 6 @ 80%, 3 @ 90%, 11 @ 99%. So lots of very unconfident answers.]

[EDITED again to add: Perhaps add an optional mode in which some score information gets shared, anonymously or otherwise, so that there can be a leaderboard and a display of many users' calibration graphs and so forth, for those who like to compete or to benchmark themselves.]

[EDITED again to add, though probably no one cares but me: after 400 questions all my statistics are basically the same as above, so apparently I'm (1) consistent and (2) a slow learner.]

Comment author: JoshuaFox 08 December 2012 07:27:11PM *  0 points [-]

Thanks. FWIW, My average score was 13.

And yes, a sharing mode would be useful. As-is, we have no basis for comparison.

Comment author: asparisi 27 November 2012 12:49:18PM 0 points [-]

Well, you can get up to 99 points for being 99 percent confident and getting the right answer, or minus several hundred (I have yet to fail at a 99 so I don't know how many) for failing at that same interval.

Wrong answers are, for the same confidence interval, more effective at bringing down your score than right answers are at bringing it up, so in some sense as long as you are staying positive you're doing good.

But if you want to compare further, you'd have to take into account how many questions you've answered, as your lifetime total will be different depending on the questions you answer. (990 after 10 questions would be exceptional: best possible score. 990 after 1,000 questions means you are getting a little less than a point per question, overall)