It doesn't have to be a Gaussian distribution. We would expect it to look like one under reasonably assumed conditions, but systematic bias would skew it. A particularly large single source (say there was a Battle of Dosworth Field that happened 400 years later) could easily result in a bimodal distribution.
In order for Wisdom of Crowds to work (as it's expected to work), people aren't guessing along a Gaussian distribution. They're applying knowledge they have, and some of that knowledge is useful information, while some of that knowledge is noise. All the useful information pulls the mean towards the true value, while all the noise pulls it away. The difference is that the useful information converges on a single value, (because it's a convergent problem with a single correct answer), while all the noise pulls arbitrarily in all directions.
Provided there isn't some reason for the noise itself to converge on a single value (and I think this is where my previous comments have not necessarily been clear, I'm talking about the noise converging, not the overall mean), the noise should cancel itself out.
It should be obvious that if you give people a right answer and a wrong answer, the noise will be weighted in the direction of the wrong answer (because there's no corresponding error on the other side of the true value). Even if you have two wrong answers on either side of a true value, and ask people to pick the one closest to the true value, you will still have a skew problem, because unless the two values are equidistant to the true value (which defeats the point of the question), your noise is not going to be equally distributed around the true value.
I have a confession to make: I have been not "publishing" my results to an experiment because the results were uninteresting. You may recall some time ago that I made a post asking people to take a survey so that I could look at a small variation of the typical "Wisdom of the Crowds" experiment where people make estimates on a value and the average of crowd's estimates is better than that of all or almost all of the individual estimates. Since LessWrong is full of people who like to do these kinds of things (thank you!), I got 177 responses - many more than I was hoping for!
I am now coming back to this since I happened upon an older post by Eliezer saying the following
(Emphasis added.) It turns out that I myself was sitting upon exactly such results.
The results are here. Sheet 1 shows raw data and Sheet 3 shows some values from those numbers. A few values that were clearly either jokes or mistakes (like not noticing the answer was in millions) were removed. In summary: (according to Wikipedia) 1000 million people in Africa (as of 2009) whereas the estimate from LessWrong was 781 million and the first transatlantic telephone call happened in 1926 whereas the average from the poll was 1899.
There! I've come clean!
I had deferred making this public because I thought the result that I was trying to test wasn't really being tested in this experiment, regardless of the results. The idea (see my original post linked about) was to see whether selecting between two choices would still let the crowd average out to the correct value (this two-option choice was meant to reflect the structure of some democracies). But how to interpret the results? It seemed that my selection of values is too important and that the average would change depending on what I picked even if everyone was to make an estimate, then look at the two options and choose the best one. So perhaps the only result of note here is that for the questions given, Less Wrong users were not particularly great at being a wise crowd.