Lumifer comments on The Triumph of Humanity Chart - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (77)
From the LW slack: http://www.measuringworth.com/
That site isn't going to help me with XIX century China.
I understand interest rates, and inflation, and purchasing power parity, and all that. That all works fine for more or less developed economies where people buy with money the great majority of what they consume.
The charts posted claim to reflect the entire world and they go back to early XIX century. Whole-world data at that point is nothing but a collection of guesstimates.
Yeah. My understanding is you basically get a bunch of economists in the room to break down the problem into relevant parts, then get a bunch of historians in the room, calibrate them, get them to give credible intervals for the relevant data, and plug it all in to the model.
Is this how you think it works or is this how you think it should work?
In particular, I am curious about the "calibrating historians" part. You're going to calibrate experts against what?
It's how I think it works.
Known historical data (which they don't know).
The problem is that you want to use the best experts you have. If you are going to try to calibrate them in their field, they know it (and might have written the textbook you're calibrating them against), and if you're trying to calibrate them in the field they haven't studied, I'm not sure it's relevant to the quality of their studies.
As to "how it works", I'm pretty sure no one is actually trying to calibrate historians. I suspect the process actually works by looking up published papers and grabbing the estimates from them without any further thought -- at best. At worst you have numbers invented out of thin air, straight extrapolation of available curves, etc. etc.
Resolution and calibration are separate. They may have lower resolution in other fields but they shouldn't have lower calibration.
Edit: Thought about the previous comment, but it's not true. One thing they talk about in superforecasting is that people tend to be overconfident in their own fields while better calibrated in others.
You're thinking about this in terms of forecasting. This is not forecasting, this is historical studies.
Consider the hard sciences equivalent: you take, say, some geneticists and try to figure out whether their estimates of which genes cause what are any good by asking them questions about quantum physics to "check how they are calibrated".
No. Bayesian estimate calibration is most often used in forecasting, but it's effective in any domain which there's uncertainty, including hard sciences. In fact, calibration training is often done with either numerical trivia, using 90% credible intervals, or with true or false questions using a single percentage estimate. I recommend checking out "How to Measure Anything" for a more indepth treatment.
Yes, that's essentially how it works, except that you then give them feedback to see if they're over or under confident. They'd have to be relatively easy questions though, otherwise all the estimates would cluster around fifty percent and it wouldn't be very useful training for high resolution answers.
Citation needed.
Not all uncertainty is created equal. If uncertainty comes from e.g. measurement limitations, the Bayesian calibration is useless.
Note that science is mostly about creating results that can be replicated by anyone regardless of how well or badly calibrated they are.
That's how you imagine it to work, since I don't expect anyone to actually be doing this. But let's see, assume we have successfully run the calibration exercises with our group of geneticists. What do you expect them to change in their studies of which genes do what? We can get even more specific, let's say we're talking about one of the twin studies where the author tracked a set of twins, tested them on some phenotype feature X, and is reporting the results that the twins correlate Y% while otherwise similar general population is correlated Z%. What results would better calibration affect?