Lumifer comments on The Triumph of Humanity Chart - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (77)
You're thinking about this in terms of forecasting. This is not forecasting, this is historical studies.
Consider the hard sciences equivalent: you take, say, some geneticists and try to figure out whether their estimates of which genes cause what are any good by asking them questions about quantum physics to "check how they are calibrated".
No. Bayesian estimate calibration is most often used in forecasting, but it's effective in any domain which there's uncertainty, including hard sciences. In fact, calibration training is often done with either numerical trivia, using 90% credible intervals, or with true or false questions using a single percentage estimate. I recommend checking out "How to Measure Anything" for a more indepth treatment.
Yes, that's essentially how it works, except that you then give them feedback to see if they're over or under confident. They'd have to be relatively easy questions though, otherwise all the estimates would cluster around fifty percent and it wouldn't be very useful training for high resolution answers.
Citation needed.
Not all uncertainty is created equal. If uncertainty comes from e.g. measurement limitations, the Bayesian calibration is useless.
Note that science is mostly about creating results that can be replicated by anyone regardless of how well or badly calibrated they are.
That's how you imagine it to work, since I don't expect anyone to actually be doing this. But let's see, assume we have successfully run the calibration exercises with our group of geneticists. What do you expect them to change in their studies of which genes do what? We can get even more specific, let's say we're talking about one of the twin studies where the author tracked a set of twins, tested them on some phenotype feature X, and is reporting the results that the twins correlate Y% while otherwise similar general population is correlated Z%. What results would better calibration affect?
That was an overconfident statement, but for more on how Calibration is useful in places other than Forecasting, check out "How to Measure Anything" as mentioned in the last comment.
Once calibrated, they can make estimates on how sure they are of certain hypotheses, and of how likely treatments based on those hypotheses would lead to lives saved. This in turn can allow them to quantify what experiment to run next using value of information calculations.
Furthermore, by taking a survey of many of these calibrated genetic experts then extremizing their results, you can get an idea of how likely certain hypotheses are to turn out being correct.
I don't know if you read scientific papers, but they don't "make estimates on how sure they are of certain hypotheses". They present the data and talk about the conclusions and implications that follow from the data presented. The potential hypotheses are evaluated on the basis of data, not on the basis of how well-calibrated does a particular researcher feel.
Calibration is good for guesstimates, it's not particularly valuable for actual research.
That's forecasting. Remember, we're not talking about forecasting.
I'm not really sure how to answer this because I think you misunderstand calibration.
Science moves forward through something called scientific consensus. How does scientific consensus work right now? Well, we just kind of use guesswork. Expert calibration is a more useful way to understand what the scientific consensus actually is.
No, it's a decision model. The decision model uses a forecast "How many lives can be saved", but it also uses calibration of known data "Based on the data you have, how sure are you that this particular fact is true".
No. This is absolutely false. Science moves forward through being able to figure out better and better how reality works. Consensus is really irrelevant to the process. The ultimate arbiter is reality regardless of what a collection of people with advanced degrees can agree on.
That has nothing to do with calibration. "How many lives can be saved" is properly called a point forecast which provides an estimate of the center of the distribution. These are very popular but also limited because a much more useful forecast would come with an expected error and, ideally, would specify the shape of the distribution as well.
"Based on the data you have, how sure are you that this particular fact is true" is properly a question about the standard error of the estimate and it has nothing to do with subjective beliefs (well-calibrated or not) of the author.
I only care about someone's calibration if I'm asking him to guess. If the answer is "based on the data", it is based on the data and calibration is irrelevant.
While this completely true, and consensus only plays a minor role in science, it's not true that consensus is irrelevant. Given no other information about a certain hypothesis other than that the majority of scientists believe it to be true, the rational course of action would be to adjust belief in the hypothesis upward. Of course, evidence contradicting the hypothesis would nullify this consensus effect. Even a small amount of evidence trumps a large consensus.
No, that's the popular conception of science, but unfortunately it's not an oracle that proves reality true or false. What observation and experiments give us are varying levels evidence that can falsify some hypotheses and point towards the truth of other hypotheses. We then use human reasoning to put all this evidence together and let humans decide how sure they are of something. If they have lots and lots of evidence that thing can become a "theory" based on the consensus that there's quite a lot of it and it's really good, and even more evidence that's even better makes that thing a "law". But it's based on a subjective sense of "how good these data are."
Not quite. It also has to do with all the other previous experiments done, your certainty in the model itself, your ideas about how reality works, and a lot of other things.
Yes, ideally this would be a credible interval with an estimated distribution, but even a credible interval assuming uniform distirubtion be very useful for this purpose.
In terms of calibration, if someone is well calibrated, and they give a credible interval with 90% confidence, the better calibrated you are, the more sure you can be that if they make 100 of such estimates, around 90% of them will lie within the credible interval you gave.
Well calibrated people will base their guesses on data, poorly calibrated people will not. Your understanding of calibration isn't in line with research done by Douglas Hubbard, Phillip Tetlock, and others who research human judgement.
Heh. Do you mean that's a conception of science held by not-too-smart uneducated people? X-)
Sense make not. Reality is always true.
Speaking generally, you seem to treat science as people asserting certain things and so, to decide on how much to trust them, you need to know how calibrated those people are. That seems very different from my perception of science which is based on people saying "This is so, you can test it yourself if you want".
Under your approach, the goal is achieving consensus. Under my system, the goal is to provide replicability and show that it actually works.
Data does not depend on calibration of particular people.
I think we have to separate two ideas here.
There's the data you get from an experiment
There's the conclusions you can draw from that data.
I would agree that the data does not depend on the calibration of particular people. But the conclusions you get from that data DO need to be calibrated. Furthermore, other scientists may want to do experiments based on those conclusions... their decision to do that will really be based on how likely they think the conclusions are accurate. The process of science is building new conclusions on the basis of those old conclusions - if it's just about gathering the data, you never gain a deeper understanding of reality.
Aren't both these views of science oversimplifications? I mean, in practice most of the people making use of the work scientists have done aren't really testing the scientists' work for themselves (they're kinda doing it implicitly by making use of that work, but the whole point is that they are confident it's not going to fail).
Reality certainly is the ultimate arbiter, but regrettably we don't get to ask Reality directly whether our theories are correct; all we can do is test them somewhat (in some cases it's not even clear how to begin doing that; I'm looking at you, string theory) and that testing is done by fallible people using fallible equipment, and in many cases it's very difficult to do in a way that actually lets you separate the signal from the noise, and most of us aren't well placed to evaluate how fallibly it's been done in any given case, and in practice usually we have to fall back on something like "scientific consensus" after all.
I think you and MattG are at cross purposes about the role he sees for calibration in science. The process by which actual primary scientific work becomes useful to people who aren't specialists in the field goes something like this:
Calibration (in the sense we're talking about here) isn't of much relevance to Alice when she's doing the primary research. She will report that the Daily Mail is positively associated with brain cancer in rats (RR=1.3, n=50, CI=[1.1,1.5], p=0.01, etc., etc., etc.) and that's more or less it. (I take it that's the point you've been making.)
But Bob's opinion about the carcenogenicity of the Daily Mail (having read Alice's papers) is an altogether slipperier thing; and the opinion to which he and Beth and the others converge is slipperier still. It'll depend on their assessment of how likely it is that Alice made a mistake, how likely it is that Aloysius's results are fraudulent given that he took a large grant from the DMG Media Propaganda Fund, etc.; and on how strongly Bob is influenced when he hears Bill say "... and of course we all know what a shoddy operation Alex's lab is."
It is in these later stages that better calibration could be valuable, and that I think Matt would like to see more explicit reference to it. He would like Bob and Bill and Beth and the rest to be explicit about what they think and why and how confidently, and he would like the consensus-generating process to involve weighing people's opinions more or less heavily when they are known to be better or worse at the sort of subjective judgement required to decide how completely to mistrust Aloysius because of his funding.
I'm not terribly convinced that that would actually help much, for what it's worth. But I don't think what Matt's saying is invalidated by pointing out that Alice's publications don't talk about (this kind of) calibration.