Open Thread, Jul. 27 - Aug 02, 2015

MrMind

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should be posted in Discussion, and not Main.

4. Open Threads should start on Monday, and end on Sunday.

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should be posted in Discussion, and not Main.

4. Open Threads should start on Monday, and end on Sunday.

That is why I looked at all 10 questions in aggregate.

Well, you did not look at calibration, you looked at overconfidence which I don't think is a terribly useful metric -- it ignores the actual calibration (the match between the confidence and the answer) and just smushes everything into two averages.

It reminds me of an old joke about a guy who went hunting with his friend the statistician. They found a deer, the hunter aimed, fired -- and missed. The bullet went six feet to the left of the deer. Amazingly, the deer ignored the shot, so the hunter aimed again, fired, and this time the bullet went six feet to the right of the deer. "You got him, you got him!" yelled the statistician...

So, no, I don't think that overconfidence is a useful metric when we're talking about calibration.

but I also did another analysis which looked at slopes across the range of subjective probabilities

Sorry, ordinary least-squares regression is the wrong tool to use when your response variable is binary. Your slopes are not valid. You need to use logistic regression.

Overconfidence is the main failure of calibration that people tend to make in the published research. If LWers are barely overconfident, then that is pretty interesting.

I used linear regression because perfect calibration is reflected by a linear relationship between subjective probability and correct answers, with a slope of 1.

If you prefer, here is a graph in the same style that Yvain used.

X-axis shows subjective probability, with responses divided into 11 bins (<5, <15, ..., <95, and 95+). Y-axis shows proportion correct in each bin, blue dots ... (read more)

8

Open Thread, Jul. 27 - Aug 02, 2015

8

8

8

Open Thread, Jul. 27 - Aug 02, 2015

8

8