Normal_Anomaly comments on A fun estimation test, is it useful? - Less Wrong

5 Post author: mwengler 20 December 2010 09:09PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (49)

You are viewing a single comment's thread.

Comment author: Normal_Anomaly 21 December 2010 12:39:28PM 2 points [-]

I found another test that's more comprehensive. It has lots more questions, lets you give a confidence estimate for each, and tells you how well calibrated you are at 0% to 100% probability. And it notes both underconfidence and overconfidence.

http://www.projectionpoint.com/test1.php

I got a 78 out of 100.

Comment author: Emile 22 December 2010 10:22:00PM *  0 points [-]

I got 73.

I didn't find this test as good as the other one:

1) In the estimating test, you have to figure out things in a void, with no clue from the question. But in this test, if the question is whether Sarah Blogg was Humphrey Bogart's second wife, my estimate goes from 0.00001% to 50%. So I often find myself guessing whether it's a trick question.

2) The results don't seem to take accuracy into account, meaning you might get perfect score by answering "50%" on all question (I haven't tried). Seeing a log scoring system would be better. (But then I didn't dig too much for their formula)

3) Their graph is ugly. The vertical don't line up with the numbers at the bottom! Geez!

Comment author: Normal_Anomaly 23 December 2010 01:54:39AM *  0 points [-]

1) I like having at least some data; I still found myself using all 10 options at least once. That is, the test still relied to a large extent on my prior knowledge.

2) You're right about this. I tried and they don't; guessing 50% every time got me a perfect. I don't know enough about designing these things to make one with a log scoring rule, but it would definitely be nice to see one.

3) Ooh, that is weird. The gridlines don't seem to mean as much as the actual numbered labels; taking them off would make this go away.

It seems like neither of these tests is able to measure both calibration and discrimination.