Wisdom of the Crowd: not always so wise

tgb

I have a confession to make: I have been not "publishing" my results to an experiment because the results were uninteresting. You may recall some time ago that I made a post asking people to take a survey so that I could look at a small variation of the typical "Wisdom of the Crowds" experiment where people make estimates on a value and the average of crowd's estimates is better than that of all or almost all of the individual estimates. Since LessWrong is full of people who like to do these kinds of things (thank you!), I got 177 responses - many more than I was hoping for!

I am now coming back to this since I happened upon an older post by Eliezer saying the following

When you hear that a classroom gave an average estimate of 871 beans for a jar that contained 850 beans, and that only one individual student did better than the crowd, the astounding notion is not that the crowd can be more accurate than the individual. The astounding notion is that human beings are unbiased estimators of beans in a jar, having no significant directional error on the problem, yet with large variance. It implies that we tend to get the answer wrong but there's no systematic reason why. It requires that there be lots of errors that vary from individual to individual - and this is reliably true, enough so to keep most individuals from guessing the jar correctly. And yet there are no directional errors that everyone makes, or if there are, they cancel out very precisely in the average case, despite the large individual variations. Which is just plain odd. I find myself somewhat suspicious of the claim, and wonder whether other experiments that found less amazing accuracy were not as popularly reported.

(Emphasis added.) It turns out that I myself was sitting upon exactly such results.

The results are here. Sheet 1 shows raw data and Sheet 3 shows some values from those numbers. A few values that were clearly either jokes or mistakes (like not noticing the answer was in millions) were removed. In summary: (according to Wikipedia) 1000 million people in Africa (as of 2009) whereas the estimate from LessWrong was 781 million and the first transatlantic telephone call happened in 1926 whereas the average from the poll was 1899.

There! I've come clean!

I had deferred making this public because I thought the result that I was trying to test wasn't really being tested in this experiment, regardless of the results. The idea (see my original post linked about) was to see whether selecting between two choices would still let the crowd average out to the correct value (this two-option choice was meant to reflect the structure of some democracies). But how to interpret the results? It seemed that my selection of values is too important and that the average would change depending on what I picked even if everyone was to make an estimate, then look at the two options and choose the best one. So perhaps the only result of note here is that for the questions given, Less Wrong users were not particularly great at being a wise crowd.

I am now coming back to this since I happened upon an older post by Eliezer saying the following

When you hear that a classroom gave an average estimate of 871 beans for a jar that contained 850 beans, and that only one individual student did better than the crowd, the astounding notion is not that the crowd can be more accurate than the individual. The astounding notion is that human beings are unbiased estimators of beans in a jar, having no significant directional error on the problem, yet with large variance. It implies that we tend to get the answer wrong but there's no systematic reason why. It requires that there be lots of errors that vary from individual to individual - and this is reliably true, enough so to keep most individuals from guessing the jar correctly. And yet there are no directional errors that everyone makes, or if there are, they cancel out very precisely in the average case, despite the large individual variations. Which is just plain odd. I find myself somewhat suspicious of the claim, and wonder whether other experiments that found less amazing accuracy were not as popularly reported.

(Emphasis added.) It turns out that I myself was sitting upon exactly such results.

There! I've come clean!

Let's say you're asking a thousand people to guess the date of the Battle of Bosworth Field. If I asked this right now in Less Wrong, I imagine it would receive some wildly different answers.

If you're me, and you remember it because its anniversary is on your birthday, (or if you were paying attention in a specific history class) you'll know the exact year (1485). These people are probably not very numerous, but their answers will all coincide and converge. This subgroup would also have a variance of zero.

All the people who were paying only a little bit of attention in that history class, or watched the first series of Blackadder, will not know the exact date, but they'll probably guess to within a few decades. This subgroup has a wider variance, but it's still pretty tight, and they're answering a convergent question. There's a correct answer, and the answer these people give is informed by it, even if it's not correct. In the absence of systematic bias, we would expect roughly the same number of people to answer 1480 as 1490, and so the mean of this group should converge.

We now look at a wider variance subgroup, which includes all the people who only have a sketchy idea of when this battle was and what it was about. Some people will recall it's got something to do with the Tudor dynasty, and Henry VIII was early 16th century. Some will recall that there was a King Richard involved, and dig up a late 14th century connection. They are all contributing some information to proceedings, (14th-16th Century), but in the absence of systematic bias, we'd expect people to be as wrong on one side as they are on the other. Even greater variance subgroups, who aren't sure whether this battle was fought by Romans or Crusaders or Confederates, are still contributing some small quantity of information by giving answers in the range of human history. No-one's going to say 3991 AD, or 6,000,000 BC.

As the variance gets wider, the population of any given subgroup gets larger, but the coherence of their answers gets smaller. If you take a hundred people who have absolutely no knowledge of human history and ask them when the Battle of Bosworth Field occurred, you're basically asking them to pick a number. Their answers aren't going to converge on anything, so they won't systematically interfere with the overall distribution, while the answers that are more informed will converge on the correct answer.

But systematic bias does occur. American education on non-American history is notoriously sketchy. If our participants included a large number of Americans, they're more likely to guess a date in American history through the availability heuristic. All of a sudden, the uninformed answers will start converging at some point in the late 19th Century, which will skew the overall distribution and pull the mean forward in time. The least wise parts of the crowd suddenly found a way to be a whole lot louder.

That's what I meant by your noise converging on the same answer. In giving people an incorrect choice, you're giving all the people who have no knowledge an opportunity to pick the same incorrect answer. If they didn't have that answer to converge on, the mean of their answer wouldn't be able to exert as much influence on the overall distribution.

Does that make sense?

(This also does point to an obvious source of systematic bias when dealing with dates: we have better records [and hence more available knowledge] of events closer to the present. History is lumpy, and forward-weighted, so any uninformed guess on the date of an event in the past is going to be distorted around points of greater historical interest, many of which occurred over the last century).

This seems like a round-about way to describe a bell curve...

But suppose in your example that we're only asking those silly Americans, who, like myself, have only even heard of the Battle of Bosworth as a name and really know nothing about it except maybe some English people were involved or something. And so let's assume that people are guessing as a bell curve around 1600 with a large variance of, say, 200 years or so. If the two options are 1600 and 1200, let's say, then 15.8% of the people will be guessing 1200 (ie. think it's earlier than 1400) and th... (read more)

33

Wisdom of the Crowd: not always so wise

33

33

33

Wisdom of the Crowd: not always so wise

33

33