jimrandomh comments on 2012 Survey Results - Less Wrong

80 Post author: Yvain 07 December 2012 09:04PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (640)

Sort By: Leading

You are viewing a single comment's thread.

Comment author: jimrandomh 29 November 2012 03:48:00PM 26 points [-]

The calibration question is an n=1 sample on one of the two important axes (those axes being who's answering, and what question they're answering). Give a question that's harder than it looks, and people will come out overconfident on average; give a question that's easier than it looks, and they'll come out underconfident on average. Getting rid of this effect requires a pool of questions, so that it'll average out.

Comment author: Morendil 29 November 2012 06:26:32PM 8 points [-]

Yep. (Or as Yvain suggests, give a question which is likely to be answered with a bias in a particular direction.)

It's not clear what you can conclude from the fact that 17% of all people who answered a single question at 50% confidence got it right, but you can't conclude from it that if you asked one of these people a hundred binary questions and they answered "yes" at 50% confidence, that person would only get 17% right. The latter is what would deserve to be called "atrocious"; I don't believe the adjective applies to the results observed in the survey.

I'm not even sure that you can draw the conclusion "not everyone in the sample is perfectly calibrated" from these results. Well, the people who were 100% sure they were wrong, and happened to be correct, are definitely not perfectly calibrated; but I'm not sure what we can say of the rest.

Comment author: CarlShulman 01 December 2012 09:18:50PM *  5 points [-]

I have often pondered this problem with respect to some of the traditional heuristics and biases studies, e.g. the "above-average driver" effect. If people consult their experiences of subjective difficulty at doing a task, and then guess they are above average for the ones that feel easy, and below average for the ones that feel hard, this will to some degree track their actual particular strengths and weaknesses. Plausibly a heuristic along these lines gives overall better predictions than guessing "I am average" about everything.

However, if we focus in on activities that happen to be unusually easy-feeling or hard-feeling in general, then we can make the heuristics look bad by only showing their successes and not their failures. Although the name "heuristics and biases" does reflect this notion: we have heuristics because they usually work, but they produce biases in some cases as an acceptable loss.

Comment author: steven0461 29 November 2012 11:28:36PM *  1 point [-]

I would agree that this explains the apparent atrocious calibration. It's worth an edit to the main post. No reason to beat ourselves up needlessly.

People were answering different questions in the sense that they each had an interval of their own choosing to assign a probability to, but obviously different people's performance here was going to be strongly correlated. Bayes just happens to be the kind of guy who was born surprisingly early. If everyone had literally been asked to assign a probability to the exact same proposition, like "Bayes was born before 1750" or "this coin will come up heads", that would have been a more extreme case. We'd have found that events that people predicted with probability x% actually happened either 0% or 100% of the time, and it wouldn't mean people were infinitely badly calibrated.

Comment author: [deleted] 30 November 2012 12:12:00AM -1 points [-]

All of that also applies to the year calibration questions in previous surveys and yet people did much better in those.

Comment author: steven0461 30 November 2012 12:48:15AM 4 points [-]

Because they weren't about events that occurred surprisingly early.