Wisdom of the Crowd: not always so wise
I have a confession to make: I have been not "publishing" my results to an experiment because the results were uninteresting. You may recall some time ago that I made a post asking people to take a survey so that I could look at a small variation of the typical "Wisdom of the Crowds" experiment where people make estimates on a value and the average of crowd's estimates is better than that of all or almost all of the individual estimates. Since LessWrong is full of people who like to do these kinds of things (thank you!), I got 177 responses - many more than I was hoping for!
I am now coming back to this since I happened upon an older post by Eliezer saying the following
When you hear that a classroom gave an average estimate of 871 beans for a jar that contained 850 beans, and that only one individual student did better than the crowd, the astounding notion is not that the crowd can be more accurate than the individual. The astounding notion is that human beings are unbiased estimators of beans in a jar, having no significant directional error on the problem, yet with large variance. It implies that we tend to get the answer wrong but there's no systematic reason why. It requires that there be lots of errors that vary from individual to individual - and this is reliably true, enough so to keep most individuals from guessing the jar correctly. And yet there are no directional errors that everyone makes, or if there are, they cancel out very precisely in the average case, despite the large individual variations. Which is just plain odd. I find myself somewhat suspicious of the claim, and wonder whether other experiments that found less amazing accuracy were not as popularly reported.
(Emphasis added.) It turns out that I myself was sitting upon exactly such results.
The results are here. Sheet 1 shows raw data and Sheet 3 shows some values from those numbers. A few values that were clearly either jokes or mistakes (like not noticing the answer was in millions) were removed. In summary: (according to Wikipedia) 1000 million people in Africa (as of 2009) whereas the estimate from LessWrong was 781 million and the first transatlantic telephone call happened in 1926 whereas the average from the poll was 1899.
There! I've come clean!
I had deferred making this public because I thought the result that I was trying to test wasn't really being tested in this experiment, regardless of the results. The idea (see my original post linked about) was to see whether selecting between two choices would still let the crowd average out to the correct value (this two-option choice was meant to reflect the structure of some democracies). But how to interpret the results? It seemed that my selection of values is too important and that the average would change depending on what I picked even if everyone was to make an estimate, then look at the two options and choose the best one. So perhaps the only result of note here is that for the questions given, Less Wrong users were not particularly great at being a wise crowd.
Doing Science! Open Thread Experiment Results
Early in the month I announced that I was doing an experiment: I was going to start two Open Threads in January (one on the 1st, and the other on the 15th) and compare the number of comments on these threads to those of other months. My hypothesis was that having two Open Threads would raise the overall number of comments.
The reason for this experiment was recent discussions regarding how useful threads such as these were quickly buried. Well, the experiment is over now, and here are the results:
I did a search for Open Threads, and entered all the monthly ones I could find into an Excel spreadsheet. I made them into a graph, and I discovered an anomaly. There was an 8-month timespan from February 2010-September 2010, in which the comment counts were extremely high (up to 2112). Many of these threads had 2, 3, or 4 parts, because they were getting filled up.
I wasn't around LW back then, and I don't feel like reading through them all, so I don't know why this time period was so active. My current hypothesis (with P=.75) is that anomalous time period was before the Discussion section was created. I'm sure I could look it up to see if I'm right, but I bet one of the long-term LWers already knows if this is true or not, so I'll crowd-source the info. (Comment below if you know that I am correct or incorrect in my hypothesis.)
Now for the data:
The January 1-15, 2012 thread had: 122 comments
The January 16-31, 2012 thread had: 236 comments
For a grand total of: 358 comments in Jan 2012
The average Open Thread had: 448.6 comments
The median Open Thread had: 204 comments
The average OT of the past 14 mo's: 126.5 comments
So overall, the January thread had LESS than the average monthly thread, but more than the median.
IF however we look at the past 14 months (which was the end of the anomaly), then the January 2012 Open Thread had almost THREE TIMES the average.
My original hypothesis had probabilities assigned to various increases in comment rate, but I was way off because I didn't at all think it would shrink (if we include the anomaly) or that it would be 300% bigger (if we don't)
Here's a handy-dandy chart, because everything is better with pictures in!


Subscribe to RSS Feed
