satt comments on Checking Kurzweil's track record - Less Wrong

12 Post author: Stuart_Armstrong 30 October 2012 11:07AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (53)

You are viewing a single comment's thread. Show more comments above.

Comment author: satt 03 November 2012 05:32:39PM 2 points [-]

One way to gauge how reliable people's judgements are: have multiple people rate each Kurzweil prediction and see how well their ratings agree. So far LWers have committed to checking at least 200 predictions, so if everyone pulls through you'll be able to get multiple ratings of at least 28 questions. Those multiple ratings could then be cross-checked for each question.

(I won't volunteer to rate any statements myself because (1) I'm lazy; (2) I already have a mildly negative view of Kurzweil's predictive ability, which might make me biased; and (3) I read your earlier post and re-rated the 10 Age of Spiritual Machines predictions in that post myself, so I've already been primed in that respect.)

Comment author: Unnamed 04 November 2012 02:59:33AM 3 points [-]

One way to gauge how reliable people's judgements are: have multiple people rate each Kurzweil prediction and see how well their ratings agree.

This is a good idea. It's standard operating procedure (for measures which require a rater's judgment) to have 2 raters for at least some of the items, and to report the agreement rate on those items ("inter-rater reliability"). Be sure to vary which raters are overlapping; for example, don't give gwern and bsterrett the same 10 predictions (instead have maybe one prediction that they both rate, and one where bsterrett & Tenoke overlap, etc.) - that way the agreement rate tells you something about how much agreement there is between all of the raters (and not just between particular pairs of raters).

In cases where the 2 raters disagree, you could just have a 3rd rater rate it and then go with their rating, or you could do something more complicated (like having the two raters discuss it and try to reach a consensus).