Unnamed comments on Checking Kurzweil's track record - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (53)
This is a good idea. It's standard operating procedure (for measures which require a rater's judgment) to have 2 raters for at least some of the items, and to report the agreement rate on those items ("inter-rater reliability"). Be sure to vary which raters are overlapping; for example, don't give gwern and bsterrett the same 10 predictions (instead have maybe one prediction that they both rate, and one where bsterrett & Tenoke overlap, etc.) - that way the agreement rate tells you something about how much agreement there is between all of the raters (and not just between particular pairs of raters).
In cases where the 2 raters disagree, you could just have a 3rd rater rate it and then go with their rating, or you could do something more complicated (like having the two raters discuss it and try to reach a consensus).