Douglas_Knight comments on The usefulness of correlations - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (52)
The article starts and ends with the claim that "probability" is inferior to "real measurement" but I have no idea what the distinction is supposed to be. Salviati's attempt at "real measurement" got him a much better instrument, Z, than when Simplicio "collects some experimental data," but that doesn't mean anything. There's no point of view that's going to make Y look like a better instrument than Z. I suppose someone might be impressed by the p<.001 claim that the cor(X,Y) > .5, but if Salviati knows that cor(X,Z) > .95 with p<.1, he probably knows that cor(X,Z) > .5 with p<.001 anyhow!
It certainly is true that a correlation of .6 doesn't give you a good measurement. (Is that the point?)
I concur. This has nothing to do with the relevance or value of probability and statistics; it's just debunking the idea that a correlation coefficient that's substantial but not very close to +-1 gives you much predictive power.
What makes Simplicio's performance worse than Salviati's isn't the fact that he's using probability and statistics. It's the fact that the information he has available is very poor. Describing what he's got in terms of correlation coefficients has, at most, the effect of obscuring just how terrible they are, but that's not a problem with probability and statistics, it's a problem with not understanding probability and statistics.
Douglas_Knight:
That is part of it.
gjm:
That is more of it.
gjm:
And this is the final part. As a matter of practical fact -- look at almost any scientific paper that presents correlation coefficients -- if you are calculating correlations, 0.6 is about typical of the correlations you will be finding, and I think I'm being generous there. The reason you don't see correlations of 0.995 reported, let alone 0.99995 (i.e. a measurement to two significant figures) is that if your data were that good, you wouldn't waste your time doing statistics on them. A correlation of 0.6 means that you have poor data and almost no predictive capacity. It takes a correlation of 0.866 to get even 1 bit of mutual information. How often do you see correlations of that size reported?
Statistics is the science of precisely wringing what little information there is from foggy data. And yet, people keep on drawing lines through scatterplots and summarising results as "X's are Y's", even when the implied prediction does only fractionally better than chance.
Eliezer wrote: "Let the winds of evidence blow you about as though you are a leaf, with no direction of your own", which is very inspiring, but in practical terms cannot be taken literally. If you are being blown up and down the probability scale, your probabilities are nowhere near 0 or 1. You can only be easily swayed when you are ignorant. You can only remain easily swayed by remaining ignorant. The moment you acquire knowledge, instead of precisely measured ignorance, you are wearing lead-weighted boots.
That's what I took the point to be. The initial descriptions of what Simplicio and Salviati accomplished make them sound comparable. It wouldn't occur to most that one was overwhelmingly superior to the other. But working it out shows otherwise.
It's true that a lot is buried in the line "Salviati instead tries to measure X, and finds a variable Z which is experimentally found to have a good chance of lying close to X." What was required to establish this "experimental finding"? It might have taken labors far in excess of Simplicio's. But now we know that, unless Salviati had to do much, much more work, his approach is to be preferred.
I think the superiority will be obvious to anyone who's ever seen a few scatterplots of correlated variables, and who can imagine a graph of X against X + noise where sd(noise) = 0.1*sd(X), and who thinks for a moment. Of course many people, much of the time, won't actually think for a moment, but that's a very general problem that can strike anywhere.
Suppose the story had gone like this: Simplicio measures X, and does it so well that his measurement has a correlation of 0.6 with X. Salviati examines lots of pairs (X,Y) and finds that X and Y typically differ by about 0.1 times the s.d. of X. Then the result would have been the same as before. Would that be a reason to say "measurement is no good; use probability and statistics instead"? Of course not.
Indeed. What matters is not what the procedures are called, but how they compare. Salviati's results completely trump Simplicio's.
Correlation, maybe?