No, Americans Don't Think Foreign Aid Is 26% of the Budget
I hate the polling question "What percentage of the US budget goes to foreign aid?" Or, more precisely, I hate the way the results are interpreted. The way these polls are reported is essentially guaranteed to produce a wild overestimate, which inevitably leads experts to write "how wrong Americans are" pieces, like this Brookings article claiming that "Americans believe foreign aid is in the range of 25 percent of the federal budget," or KFF[1]reporting that the "average perceived amount spent on foreign aid was 26%." But this isn't just ignorance. The real problem is a failure of measurement and the statistics used to summarize it. The story isn't "Americans are clueless" (though that may also be true), it's "pollsters are using the wrong math." The Real Problem: Arithmetic Mean + Small Numbers The problem is that pollsters ask for a percentage, then take the arithmetic mean to represent the data. For small true values, this approach is structurally doomed, and it has nothing to do with foreign aid specifically. It has to do with how we summarize guesses about small numbers. When the true value is small, guesses are bounded at zero but unbounded above. That is, nobody can guess negative percentages, but anyone can guess 50% or 80%. On top of that, people tend to respond with round numbers like 5% or 20%, not decimals like 0.05% or 0.15%. This means that, even if there are many guesses around the true value of ~1%, there can only be outliers in the positive direction, so it results in a right-skewed distribution. If we choose the arithmetic mean as the average, it will be dragged upward by the right tail. A handful of overestimates skew the whole average. This isn’t a sampling problem, and it won’t go away with more data. With more data, the arithmetic mean converges to the population arithmetic mean, but in a right-skewed distribution, that number is systematically higher than the median or geometric mean. A larger sample just gives you a more precise estima
Yeah, I think in that case the best thing to do would be to use log-odds aggregation. That would be symmetric.