Wisdom of the Crowd: not always so wise

tgb

I have a confession to make: I have been not "publishing" my results to an experiment because the results were uninteresting. You may recall some time ago that I made a post asking people to take a survey so that I could look at a small variation of the typical "Wisdom of the Crowds" experiment where people make estimates on a value and the average of crowd's estimates is better than that of all or almost all of the individual estimates. Since LessWrong is full of people who like to do these kinds of things (thank you!), I got 177 responses - many more than I was hoping for!

I am now coming back to this since I happened upon an older post by Eliezer saying the following

When you hear that a classroom gave an average estimate of 871 beans for a jar that contained 850 beans, and that only one individual student did better than the crowd, the astounding notion is not that the crowd can be more accurate than the individual. The astounding notion is that human beings are unbiased estimators of beans in a jar, having no significant directional error on the problem, yet with large variance. It implies that we tend to get the answer wrong but there's no systematic reason why. It requires that there be lots of errors that vary from individual to individual - and this is reliably true, enough so to keep most individuals from guessing the jar correctly. And yet there are no directional errors that everyone makes, or if there are, they cancel out very precisely in the average case, despite the large individual variations. Which is just plain odd. I find myself somewhat suspicious of the claim, and wonder whether other experiments that found less amazing accuracy were not as popularly reported.

(Emphasis added.) It turns out that I myself was sitting upon exactly such results.

The results are here. Sheet 1 shows raw data and Sheet 3 shows some values from those numbers. A few values that were clearly either jokes or mistakes (like not noticing the answer was in millions) were removed. In summary: (according to Wikipedia) 1000 million people in Africa (as of 2009) whereas the estimate from LessWrong was 781 million and the first transatlantic telephone call happened in 1926 whereas the average from the poll was 1899.

There! I've come clean!

I had deferred making this public because I thought the result that I was trying to test wasn't really being tested in this experiment, regardless of the results. The idea (see my original post linked about) was to see whether selecting between two choices would still let the crowd average out to the correct value (this two-option choice was meant to reflect the structure of some democracies). But how to interpret the results? It seemed that my selection of values is too important and that the average would change depending on what I picked even if everyone was to make an estimate, then look at the two options and choose the best one. So perhaps the only result of note here is that for the questions given, Less Wrong users were not particularly great at being a wise crowd.

I am now coming back to this since I happened upon an older post by Eliezer saying the following

When you hear that a classroom gave an average estimate of 871 beans for a jar that contained 850 beans, and that only one individual student did better than the crowd, the astounding notion is not that the crowd can be more accurate than the individual. The astounding notion is that human beings are unbiased estimators of beans in a jar, having no significant directional error on the problem, yet with large variance. It implies that we tend to get the answer wrong but there's no systematic reason why. It requires that there be lots of errors that vary from individual to individual - and this is reliably true, enough so to keep most individuals from guessing the jar correctly. And yet there are no directional errors that everyone makes, or if there are, they cancel out very precisely in the average case, despite the large individual variations. Which is just plain odd. I find myself somewhat suspicious of the claim, and wonder whether other experiments that found less amazing accuracy were not as popularly reported.

(Emphasis added.) It turns out that I myself was sitting upon exactly such results.

There! I've come clean!

That's kind of Eliezer's point when he talks about how astounding it is that human beings are unbiased estimators of beans in a jar. I'd agree that it's astounding, but there are plenty of other statistical phenomena that astound me equally, so I've learned to not treat my level of astonishment as a precision tool for judging incredibility.

To some extent, I suspect the mechanism of estimation plays a significant role. I doubt very much that human beings have built-in heuristics for appraising large numbers of objects. Arithmetic is a fairly novel concept, evolutionarily speaking, and some cultures don't even have the natural numbers.

So when we try and guess the number of beans in a jar, there's presumably no single go-to mechanism we're using to come up with that value. It will be some sort of aggregate of sources, such as our past experience of beans in jars, visualisations of what 200 or 400 or 600 beans all in one place might look like, or rough guesses of volume and packing density. It isn't even necessarily a transparent process. If you try and make a rough estimate of something, aren't you using some sort of basis for that? It's not like the number just pops into your head. You wrestle with it for a little while.

Individual components of that estimation may be subject to bias in a given direction, but over enough sources, over enough people with many different estimation criteria, I wouldn't trust there to necessarily be a demonstrable bias over repeated experiments without deliberate intervention on the part of the experimenter, such as using a container of an unusual shape that would result in a known overestimation of its volume.

Edit: I should also add an expectation of bias idiosyncratic to specific questions. For example, I think it was Yvain's most recent LW membership poll that asked for the date Newton published his Philosophiæ Naturalis Principia Mathematica. If there was a widely-believed false date for this event, that would be an obvious source of noise that wouldn't be cancelled out by corresponding noise on the other side of the true value.

Individual components of that estimation may be subject to bias in a given direction, but over enough sources, over enough people with many different estimation criteria, I wouldn't trust there to necessarily be a demonstrable bias over repeated experiments without deliberate intervention on the part of the experimenter

This can be seen simply as a version of the central limit theorem: Any sum or average of samples from ANY distribution (with finite mean and standard deviation) will be approximately normally distributed (Gaussian) with the approximation better for larger samples. Neato!

4khafra14y

According to a study cited in the Model Thinking class from Coursera.org, this is correct. Crowds which can be collectively characterized as a hedgehog do not have wisdom; crowds which are collectively foxes do have wisdom. The diversity of models is key.

33

Wisdom of the Crowd: not always so wise

33

33

33

Wisdom of the Crowd: not always so wise

33

33