AnnaSalamon comments on Navigating disagreement: How to keep your eye on the evidence - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (72)
Re: problem 1: Jelly bean number estimates are just like thermometer readings, except that the reading is in someone’s head, rather than their hand. So the obvious answer is to average everyone’s initial, solitary impressions, absent reason to expect one individual or another is an above-average (or below-average) estimator.
If your friends use lopsided weighting schemes in their second answers, should you re-update? This depends a lot on your friends.
Since I know those people, I would weight their answers according to my best estimate of their skill at such tasks, and then average the whole group, including me.
Doing this correctly can get pretty complicated. Basically, the more people you have, the less you should weight the low-quality estimates compared to the high-quality estimates.
For example, suppose that "good" thermometers are unbiased and "bad" thermometers are all biased in the same direction, but you don't know which direction.
If you have one thermometer which you know is good, and one which you're 95% sure is good, then you should weight both measurements about the same.
But if you have 10^6 thermometers which you know are good, and 10^6 which you're 95% sure are good, then you should pretty much ignore the possibly-bad ones.
Not that it matters tremendously, but I was thinking of the jelly bean problem.
What kind of weighted average?
My math isn't good enough to formalize it-- I'd do it by feel.
Drat - likewise.
Before reading your answer: Human beings are bad at estimating volumes, as opposed to lengths. I would form my estimate by observing the apparent density of jellybean in the jar (e.g. by examining a square centimeter cross-section), observing the dimensions, and multiplying. Then, on the second stage, I would discard estimates which are radically different from mine (cutoff to be chosen based on observed distribution), and take the mean of the remaining. I would allow myself to be influenced in my choice of data to include by those whose data I was already inclined to include in my average.
After reading your answer: Should I notice an apparent and popular upweighting of certain responses such as you suggest, I would increase the weight of those in my average.
I would look for response clusters. Each participant could have a different counting method rendering different results (e.g. - estimate volumes/ count radius & height/ estimate there's an empty cone at the top which you don't see), and some methods could be common pitfalls. Therefore, some results - those obtained by a wrong way of counting, should be discarded, otherwise the median result would lead away from the right result. In order to decide which is the right response cluster, trying to figure out each method/mistake and determining the correct one would be useful. Of course, your method is not necessarily the right one, just because it's yours.