Predictions of the future rely, to a much greater extent than in most fields, on the personal judgement of the expert making them. Just one problem - personal expert judgement generally sucks, especially when the experts don't receive immediate feedback on their hits and misses. Formal models perform better than experts, but when talking about unprecedented future events such as nanotechnology or AI, the choice of the model is also dependent on expert judgement.
Ray Kurzweil has a model of technological intelligence development where, broadly speaking, evolution, pre-computer technological development, post-computer technological development and future AIs all fit into the same exponential increase. When assessing the validity of that model, we could look at Kurzweil's credentials, and maybe compare them with those of his critics - but Kurzweil has given us something even better than credentials, and that's a track record. In various books, he's made predictions about what would happen in 2009, and we're now in a position to judge their accuracy. I haven't been satisfied by the various accuracy ratings I've found online, so I decided to do my own assessments.
I first selected ten of Kurzweil's predictions at random, and gave my own estimation of their accuracy. I found that five were to some extent true, four were to some extent false, and one was unclassifiable
But of course, relying on a single assessor is unreliable, especially when some of the judgements are subjective. So I started a call for volunteers to get assessors. Meanwhile Malo Bourgon set up a separate assessment on Youtopia, harnessing the awesome power of altruists chasing after points.
The results are now in, and they are fascinating. They are...
Ooops, you thought you'd get the results right away? No, before that, as in an Oscar night, I first want to thank assessors William Naaktgeboren, Eric Herboso, Michael Dickens, Ben Sterrett, Mao Shan, quinox, Olivia Schaefer, David Sønstebø and one who wishes to remain anonymous. I also want to thank Malo, and Ethan Dickinson and all the other volunteers from Youtopia (if you're one of these, and want to be thanked by name, let me know and I'll add you).
It was difficult deciding on the MVP - no actually it wasn't, that title and many thanks go to Olivia Schaefer, who decided to assess every single one of Kurzweil's predictions, because that's just the sort of gal that she is.
The exact details of the methodology, and the raw data, can be accessed through here. But in summary, volunteers were asked to assess the 172 predictions (from the "Age of Spiritual Machines") on a five point scale: 1=True, 2=Weakly True, 3=Cannot decide, 4=Weakly False, 5=False. If we total up all the assessments made by my direct volunteers, we have:
As can be seen, most assessments were rather emphatic: fully 59% were either clearly true or false. Overall, 46% of the assessments were false or weakly false, and and 42% were true or weakly true.
But what happens if, instead of averaging across all assessments (which allows assessors who have worked on a lot of predictions to dominate) we instead average across the nine assessors? Reassuringly, this makes very little difference:
What about the Youtopia volunteers? Well, they have a decidedly different picture of Kurzweil's accuracy:
This gives a combined true score of 30%, and combined false score of 57%! If my own personal assessment was the most positive towards Kurzweil's predictions, then Youtopia's was the most negative.
Putting this all together, Kurzweil certainly can't claim an accuracy above 50% - a far cry from his own self assessment of either 102 out of 108 or 127 out of 147 correct (with caveats that "even the predictions that were considered 'wrong' in this report were not all wrong"). And consistently, slightly more than 10% of his predictions are judged "impossible to decide".
As I've said before, these were not binary yes/no predictions - even a true rate of 30% is much higher that than chance. So Kurzweil remains an acceptable prognosticator, with very poor self-assessment.
There are a variety of reasons interpreters might think that a prediction didn't come true, while Kurzweil boldly claims that it did:
Kurzweil didn't express himself clearly, so interpreters misunderstood what the prediction really was. Miscommunication adds random noise, and most randomly generated predictions will turn out false, so this will skew the results against Kurzweil.
Kurzweil's prediction was vague. So charitable interpreters will think they're basically true, while less charitable interpreters will think they're basically false. And we can expect random LessWrongers to be less charitable toward Kurzweil than Kurzweil is toward Kurzweil.
Interpreters tend to be factually mistaken about current events, in a specific direction: They are ignorant of the nature, existence, or prevalence of the latest innovations in technology and culture.
Kurzweil tends to be factually mistaken about current events, in a specific direction: He thinks a variety of technologies are more advanced, and more widespread, than they really are.
There are systemic differences in the evaluation scales used by Kurzweil and by others. For instance, Kurzweil and Armstrong individuate 'predictions' differently, lumping and splitting at different points in the source text. There may also be systemic disagreements about how (temporally and technologically) precise an interpretation must be to count as 'correct,' and about whether grammatical forms like 'X is Y' most closely means 'X is always Y', 'X is usually Y', 'X is commonly Y', 'X is sometimes (occasionally) Y', or 'X is Y at least once'. This ties into vagueness, but may bias the results due to linguistic variation rather than just as a result of generic degree of interpretive charity.
I'm particularly curious about testing 3, since the strongest criticism Kurzweil could make of our methodology for assessing his accuracy is that our reviewers simply got the facts wrong. We can calibrate our assumptions about the accuracy and up-to-dateness of LessWrongers regarding technology generally. Or more specifically we can expose them to Kurzweil's arguments and see how much their assessment of his predictive success changes after hearing why he thinks he got a certain prediction 'correct'.
There's clearly a disconnect between his 'computer' and the general meaning of 'computer'; A multicore processor isn't more than one computer, and it wasn't in 1990.
Also, he seems to regard things as 'typical' that I would call 'common'; I say 'common' when it isn't sur... (read more)