KPier comments on Calibrate your self-assessments - Less Wrong

68 Post author: Yvain 09 October 2011 11:26PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (117)

You are viewing a single comment's thread.

Comment author: KPier 09 October 2011 05:31:04PM 9 points [-]

Since I first read about calibration on LessWrong, I've been trying this with tests and debate tournaments.

With a sample size of about 50: 95% of my estimated test grades are within 3% of my actual test grades.

On debate, however, if I am 60% confident I won a round, I won it 90% of the time; if I am 80% confident I won, I win 100% of the time. Other people seem to be much better than me at assessing the probability I won a debate round (if they observed it).

It seems that I am really good at some forms of estimating, and really bad in other situations, which means that overall switching from Inside View to Outside View wouldn't necessarily be an improvement, but that in certain situations it would help me enormously. Has anyone else encountered this?

Comment author: AShepard 09 October 2011 09:54:52PM 4 points [-]

Interesting that your debate predictions tend too low. In my debate experience, nearly everyone consistently overestimated their likelihood of winning a given round. This bias tended to increase the better the debaters perceived themselves to be.

Comment author: KPier 10 October 2011 12:53:39AM 4 points [-]

I think a lot of debaters I know fall into the general trap of believing the things they argue. In a debate round, you have to be focused on the mentality of "I'm winning", or you won't be able to convince the judge of that; I am probably atypical in that I notice that kind of self-deception and apparently overcorrect for it. I've convinced a number of my teammates to try this experiment as well, and most of them follow the trend you noticed.

Comment author: FiftyTwo 10 October 2011 01:24:25AM 1 point [-]

My own experience of debating is that while I can estimate the 'strategic' side relatively effectively I find it more difficult to predict whether the judges accept an individual argument. I've noticed this as a problem with several debaters, often due to the inferential gaps between them and the judges (e.g. assuming some psychological/philosophical/economic concept is intuitively obvious).

[Incidentally, I'm involved in UK bp debating, so if that makes it probable we've met pm me a name or a hint. ]

Comment author: KPier 10 October 2011 01:35:42AM 1 point [-]

Nope, US high school policy. I'm thinking of writing an article on debate and rationality (though not until after I'm done applying to college, which will be January); if you'd have something to say about that, PM me.

Comment author: Dmytry 20 March 2012 08:54:23AM *  0 points [-]

Could the debate tournaments be to some extent responsible for extremely irritating counter productive arguments online where you are left wondering what exactly did so much convince the other side and why they won't tell what it is? I never did debates at school.

Comment author: TheOtherDave 09 October 2011 06:57:09PM 0 points [-]

I've encountered similar things insofar as I'm better calibrated for some tasks than others. And I agree with you that defining the right reference classes for when to trust my estimations vs. when to trust the outside view (and which outside views to trust) is important.

I'm curious: if you re-express your data set in terms of standard deviations... e.g., the percentage of your estimated test grades that are within a std dev of the correct answer... rather than absolute percentages, do you still get very different results in the two cases?

Comment author: KPier 10 October 2011 12:56:44AM 0 points [-]

Maybe I'm being really stupid, but how exactly would I define a standard deviation of the correct answer? Using the distribution for the whole class?

Comment author: TheOtherDave 10 October 2011 04:33:32AM 3 points [-]

I meant within the set of your 50 test scores, assuming they're normalized to a common range.

To pick an extreme example: if all your test scores fall between 92% and 98%, it becomes less remarkable that your estimations of your test scores all fall within 3% of your actual test scores... anyone else could do about as well, given that fact about the data set. So it seems that knowing something about the distribution is helpful in reasoning about the causes of the differences in the accuracy of your judgments.

Comment author: KPier 11 October 2011 12:56:47AM *  0 points [-]

Oh, that makes sense.

Nope, still a big difference. For example, here are my scores from the last few weeks:

Predicted/Actual: 98/100 72/72.5 94/94 85/86 82.5/87.5 90/92

Comment author: Luke_A_Somers 11 October 2011 02:26:42PM 0 points [-]

Interesting that there were no too-high predictions.