# The usefulness of correlations

I sometimes wonder just how useful probability and statistics are. There is the theoretical argument that Bayesian probability is the fundamental method of correct reasoning, and that logical reasoning is just the limit as p=0 or 1 (although that never seems to be applied at the meta-level: what is the probability that Bayes' Theorem is true?), but today I want to consider the practice.

Casinos, lotteries, and quantum mechanics: no problem. The information required for deterministic measurement is simply not available, by adversarial design in the first two cases, and by we know not what in the third. Insurance: by definition, this only works when it's impossible to predict the catastrophes insured against. No-one will offer insurance against a risk that will happen, and no-one will buy it for a risk that won't. Randomised controlled trials are the gold standard of medical testing; but over on OB Robin Hanson points out from time to time that the marginal dollar of medical spending has little effectiveness. And we don't actually know how a lot of treatments work. Quality control: test a random sample from your production run and judge the whole batch from the results. Fine -- it may be too expensive to test every widget, or impossible if the test is destructive. But wherever someone is doing statistical quality control of how accurately you're filling jam jars with the weight of jam it says on the label, someone else will be thinking about how to weigh every single one, and how to make the filling process more accurate. (And someone else will be trying to get the labelling regulations amended to let you sell the occasional 15-ounce pound of jam.)

But when you can make real measurements, that's the way to go. Here is a technical illustration.

Prof. Sagredo has assigned a problem to his two students Simplicio and Salviati: "X is difficult to measure accurately. Predict it in some other way."

Simplicio collects some experimental data consisting of a great many pairs (X,Y) and with high confidence finds a correlation of 0.6 between X and Y. So given the value y of Y, his best prediction for the value of X is 0.6y. [Edit: that formula is mistaken. The regression line for Y against X is Y = bcX/a, assuming the means have been normalised to zero, where a and b are the standard deviations of X and Y respectively. For the Y=X+D_{1} model below, bc/a is equal to 1.]

Salviati instead tries to measure X, and finds a variable Z which is experimentally found to have a good chance of lying close to X. Let us suppose that the standard deviation of Z-X is 10% that of X.

How do these two approaches compare?

A correlation of 0.6 is generally considered pretty high in psychology and social science, especially if it's established with p=0.001 to be above, say, 0.5. So Simplicio is quite pleased with himself.

A measurement whose range of error is 10% of the range of the thing measured is about as bad as it could be and still be called a measurement. (One might argue that any sort of entanglement whatever is a measurement, but one would be wrong.) It's a rubber tape measure. By that standard, Salviati is doing rather badly.

In effect, Simplicio is trying to predict someone's weight from their height, while Salviati is putting them on a (rather poor) weighing machine (and both, presumably, are putting their subjects on a very expensive and accurate weighing machine to obtain their true weights).

So we are comparing a good correlation with a bad measurement. How do they stack up? Let us suppose that the underlying reality is that Y = X + D_{1} and Z = X + D_{2}, where X, D_{1}, and D_{2} are normally distributed and uncorrelated (and causally unrelated, which is a stronger condition). I'm choosing the normal distribution because it's easy to calculate exact numbers, but I don't believe the conclusions would be substantially different for other distributions.

For convenience, assume the variables are normalised to all have mean zero, and let X, D_{1}, and D_{2} have standard deviations 1, d_{1}, and d_{2} respectively.

Z-X is D_{2}, so d_{2} = 0.1. The correlation between Z and X is c(X,Z) = cov(X,Z)/(sd(X)sd(Z)) = 1/sqrt(1+d_{2 }^{2}) = 0.995.

The correlation between X and Y is c(X,Y) = 1/sqrt(1+d_{1 }^{2}) = 0.6, so d_{1} = 1.333.

We immediately see something suspicious here. Even a terrible measurement yields a sky-high correlation. Or put the other way round, if you're bothering to measure correlations, your data are rubbish. Even this "good" correlation gives a signal to noise ratio of less than 1. But let us proceed to calculate the mutual informations. How much do Y and Z tell you about X, separately or together?

For the bivariate normal distribution, the mutual information between variables A and B with correlation c is lg(I), where lg is the binary logarithm and I = sd(A)/sd(A|B). (The denominator here -- the standard deviation of A conditional on the value of B -- happens to be independent of the particular value of B for this distribution.) This works out to 1/sqrt(1-c^{2}). So the mutual information is -lg(sqrt(1-c^{2})).

corr. | mut. inf. | |||
---|---|---|---|---|

Simplicio | 0.6 | 0.3219 | ||

Salviati | 0.995 | 3.3291 |

What can you do with one third of a bit? If Simplicio tries to predict just the sign of X from the sign of Y, he will be right only 70% of the time (i.e. cos^{-1}(-c(X,Y))/π). Salviati will be right 96.8% of the time. Salviati's estimate will even be in the right decile 89% of the time, while on that task Simplicio can hardly do better than chance. So even a good correlation is useless as a measurement.

Simplicio and Salviati show their results to Prof. Sagredo. Simplicio can't figure out how Salviati did so much better without taking measurements on thousands of samples. Salviati seemed to just think about the problem and come up with a contraption out of nowhere that did the job, without doing a single statistical test. "But at least," says Simplicio, "you can't throw away my 0.3219, it all adds up!" Sagredo points out that it literally does not add up. The information gained about X from Y and Z together is not 0.3219+3.3291 = 3.6510 bits. The correct result is found from the standard deviation of X conditional on both Y and Z, which is sqrt(1/(1 + 1/d_{1 }^{2} + 1/d_{2 }^{2})). The information gained is then lg(sqrt(1 + 1/d_{1 }^{2} + 1/d_{2 }^{2})) = 0.5*lg(101.5625) = 3.3331. The extra information over knowing just Z is only 0.0040 = 1/250 of a bit, because nearly all of Simplicio's information is already included in Salviati's.

Sagredo tells Simplicio to go away and come up with some real data.

## Comments (56)

BestGood body, strange intro - what you're doing in the article is

usingprobability theory tocomparea certain statistical tool, correlation, with a different sort of evidence that is much more highly correlated, namely what you're calling "terrible measurement". You're using the probability-theoretic tool of conditional probability and mutual information of probability distributions to point this out.To present this as a refutation of probability is just odd.

(Couldn't get back to this earlier -- busybusybusy before taking a holiday.)

To read it as one is odd. The "strange" intro listed several areas in which probability and statistics are useful (although with slight caveats to the cases of medical research and quality control). The rest is an illustration of its limitations in practice.

I expand more on this in my response to Douglas_Knight and gjm.

*7 points [-]The article starts and ends with the claim that "probability" is inferior to "real measurement" but I have no idea what the distinction is supposed to be. Salviati's attempt at "real measurement" got him a much better instrument, Z, than when Simplicio "collects some experimental data," but that doesn't mean anything. There's no point of view that's going to make Y look like a better instrument than Z. I suppose someone might be impressed by the p<.001 claim that the cor(X,Y) > .5, but if Salviati knows that cor(X,Z) > .95 with p<.1, he probably knows that cor(X,Z) > .5 with p<.001 anyhow!

It certainly is true that a correlation of .6 doesn't give you a good measurement. (Is that the point?)

I concur. This has nothing to do with the relevance or value of probability and statistics; it's just debunking the idea that a correlation coefficient that's substantial but not very close to +-1 gives you much predictive power.

What makes Simplicio's performance worse than Salviati's isn't the fact that he's using probability and statistics. It's the fact that the information he has available is very poor. Describing what he's got in terms of correlation coefficients has, at most, the effect of obscuring just how terrible they are, but that's not a problem with probability and statistics, it's a problem with

not understandingprobability and statistics.Douglas_Knight:

That is part of it.

gjm:

That is more of it.

gjm:

And this is the final part. As a matter of practical fact -- look at almost any scientific paper that presents correlation coefficients -- if you are calculating correlations, 0.6 is about typical of the correlations you will be finding, and I think I'm being generous there. The reason you don't see correlations of 0.995 reported, let alone 0.99995 (i.e. a measurement to two significant figures) is that if your data were that good, you wouldn't waste your time doing statistics on them. A correlation of 0.6 means that you have poor data and almost no predictive capacity. It takes a correlation of 0.866 to get even 1 bit of mutual information. How often do you see correlations of that size reported?

Statistics is the science of precisely wringing what little information there is from foggy data. And yet, people keep on drawing lines through scatterplots and summarising results as "X's are Y's", even when the implied prediction does only fractionally better than chance.

Eliezer wrote: "Let the winds of evidence blow you about as though you are a leaf, with no direction of your own", which is very inspiring, but in practical terms cannot be taken literally. If you are being blown up and down the probability scale, your probabilities are nowhere near 0 or 1. You can only be easily swayed when you are ignorant. You can only remain easily swayed by remaining ignorant. The moment you acquire knowledge, instead of precisely measured ignorance, you are wearing lead-weighted boots.

That's what I took the point to be. The initial descriptions of what Simplicio and Salviati accomplished make them sound comparable. It wouldn't occur to most that one was overwhelmingly superior to the other. But working it out shows otherwise.

It's true that a lot is buried in the line "Salviati instead tries to measure X, and finds a variable Z which is experimentally found to have a good chance of lying close to X." What was required to establish this "experimental finding"? It might have taken labors far in excess of Simplicio's. But now we know that, unless Salviati had to do much, much more work, his approach is to be preferred.

I think the superiority will be obvious to anyone who's ever seen a few scatterplots of correlated variables, and who can imagine a graph of X against X + noise where sd(noise) = 0.1*sd(X), and who thinks for a moment. Of course many people, much of the time, won't actually think for a moment, but that's a very general problem that can strike anywhere.

Suppose the story had gone like this: Simplicio measures X, and does it so well that his measurement has a correlation of 0.6 with X. Salviati examines lots of pairs (X,Y) and finds that X and Y typically differ by about 0.1 times the s.d. of X. Then the result would have been the same as before. Would that be a reason to say "measurement is no good; use probability and statistics instead"? Of course not.

*0 points [-]Indeed. What matters is not what the procedures are called, but how they compare. Salviati's results completely trump Simplicio's.

Correlation, maybe?

Eh?

That's just not what correlation means. (If we have, say, X=0.6Y or X=100Y or X=0.0001Y, exactly in each case, then the correlation is 1. The correlation tells you nothing about the coefficient in the relationship.)

Presumably X and Y have been converted to canonical form with mean 0, sd 1.

Yes, that was an error. I was thinking of the case where X and Y are both normalised to have s.d. 1, in which case the regression line is indeed Y = cX, but that isn't the case here. In general, the line is Y = bcX/a where the standard deviations of X and Y are a and b.

Can you give some concrete examples of people who you think are making this mistake?

When I was starting out in trading I worked at a company where most of the traders were "spread traders" in the futures markets. They would trade either cash vs. futures or different futures expirations against each other. So, for instance, if you had future F1 that expired in september, and future F2 on the same underlying product that expired in december, they would define the spread between them F1-F2, and basically, try to buy that spread and sell it (or sell it and then buy it) over and over again. While F1 and F2 were whipping around, F1-F2 would tend to be pretty steady. The bid/ask spread of F1-F2 was determined by the volatility of F1 and F2, but the volatility of F1-F2 was much lower, so the bid/ask of F1-F2 was large compared to its volatility which is a recipe for juicy trading. So, anyway, these people wanted to me trade equities this way, but I had a super hard time with it. I would take two equities (E1 and E2) that were say 90% correlated, do the regression to get the ratio (r), and then start thinking about E1-rE2 as a spread just like I was used to. What I realized is that for 90% correlated instruments the spread volatility is 31% of the volatility of the naked instrument, which is still large compared bid/ask. The individual instruments that the futures traders were spreading were nearly 100% correlated, which is basically the requirement to have a spread market that you can reasonably talk about.

He already did:

This is a standard PCT criticism of psychology and social science, i.e., that these low correlation levels are an indicator that they're measuring the wrong things, and indeed using the wrong model of what kinds of things to measure.

(Specifically, the complaint is that those sciences assume organisms are open-loop stimulus-response reactors rather than closed-loop controllers, so they try to measure input-output correlations in their experiments, instead of using their experiments to identify the variables that organisms are able to control or tend to control.)

*6 points [-]But social science doesn't respect a correlation of .6 because they think it's a good way to measure something that could be measured directly. They find correlations either as an important step in establishing causation, a way to get large-scale trends, or a good way to measure something that can't be measured directly.

The correlation between smoking and lung cancer is only .7, but that's a

veryinteresting fact. True, just picking out smokers is a terrible way to predict who has lung cancer when compared to even a so-so screening test, which is what I interpreted the point of Richard's article as being. But knowing that there's a high correlation there is useful for other reasons. Since we now know it's causative, we can use it to convince people not to smoke. Even if we didn't know there was causation, it would at least help us to pick out who needs more frequent lung cancer screening tests.. So I am not prepared to immediately accept that someone is doing something wrong if they call a correlation of .6 pretty high.Can you or Richard give an example of something the people investigating lung cancer could have done with direct measurement that would have been more productive than analyzing the cigarettes-smoking correlation? If not, can you provide a situation where people did overuse correlations when they'd have been better off using a measurement?

*1 point [-]I wasn't aware that this was considered either psychology or social science; those are the fields towards which the criticism I pointed out was addressed, not medicine. (Medicine has a rather different set of statistics-based, politics-based, and payola-based errors to deal with.)

Correlation's a useful tool when that's all you have; the PCT criticism is that we now have more to go on than that where humans' and other organisms' behavior are concerned, so it's time to become dissatisfied with the old way and get started on improving things.

(Edit to add: WTF? This is the most baffling downvote I've ever seen OR received here, and I've seen some pretty weird ones in the past.)

Really, I'm not hostile to PCT, just skepticalâ€” but given your claims about the predictive power of PCT, and given that it's been studied for 35 years by a large group including several former academics, I think it's fair to ask this: Can you direct me to an experiment such that

Note the importance of step 2. The results you've so far pointed out to me (can't find them within LW, sorry) concern a person manipulating a dial to keep a dot in the center of the screen while acted on by unknown, varying forces, and a rat varying the pressure on a lever it needs to hold down in response to varying counterforces. Since these are cases in which 'acting like a controller' is a simple strategy that produces near-optimal results, it doesn't surprise other theories of cognition that the agents arrived at this strategy. (I find it quite probable, in fact, that some form of control theory governs much of our motor impulses, since that's a fairly simple and elegant solution to recurring problems of balance, varying strain, etc.) The point where PCT really diverges from mainstream theories of cognition is in the description of

cognitive content, not motor response; and that's where PCT's burden of proof lies.If PCT is as well-developed across levels as you claim (and well-developed enough to make diagnoses and prescriptions for, say, emotional issues), then it should be easy to make and test such a prediction in a cognitive domain. If you can present me with an experiment that clearly meets those four conditions, I'll be very interested in whatever PCT book you recommend. If 30 years haven't produced such results, then that counts as evidence too.

*3 points [-]'Standard theories of cognition' is a broad class that includes so many conflicting and open-ended models that I'm not sure I could come up with an experiment/experimental result pair that fulfills this requirement, even without the requirement that the experiment actually have that result.

*1 point [-]That's a good point. I'll have to think carefully about what kind of results would constitute a "surprising" result to theories of mind that include basic modeling capacities and preferences in the usual fashion. Any good suggestions for emending requirement 2 would be appreciated.

And when you do, what you'll discover is that none of them really predict anything we don't already know about human behavior, or provide a reductionistic model of it.

What's different about PCT is that it gives us a framework for making and

testingreductionist hypotheses about what is causing an individual's behavior. We can postulate variables they're controlling, do things to disturb the values of those variables, and observe whether the values are indeed being controlled by the person's behavior.For example, if we want to know whether someone's "Bruce"-like behavior is due to a fear of success or a desire for failure, we could artificially induce success or failure experiences and observe whether they adjust their behavior to compensate.

Now try that with the standard cognitive theories, which will only give us ways to describe what the person actually does, or make probabilistic estimates about what people usually do in that situation, rather than any way to

reduceor compress our description of the person's behavior, so that it becomes a more general predictive principle, instead of just a lengthy description of events.OK, excellent; since you assert that PCT has so much more predictive power, I'm sure you can show me many impressive, quantitative PCT-driven experimental results that

aren'tin a domain (like motor response or game strategy) where I already expect to see control-system-like behavior.For example, if you could get a mean squared error of 10% in predicting a response that balances ethical impulses against selfish ones (say, the amount that a person is willing to donate to a charity, given some sort of priming stimuli), then I'd consider that good evidence. That's the sort of result that would get me to pick up a PCT textbook.

Seriously, please point me to these results.

You've just crossed over two different definitions of "predictive" -- not to mention two different definitions of "science". What I described was something that would give you a "hard",

strictly falsifiable fact: is the person controlling variable X or not?That's actual science. But what you've asked for instead is precisely the sort of probabilistic mush that is being critiqued here in the first place. You are saying, "yes, it's all very well that science can be used to determine the actual facts, but I want some

probabilities! Give me some uncertainty, dammit!"And as a result, you seem to be under the mistaken impression that PCT has some sort of evidence deficiency I need to fix, when it's actually psychology that has a modeling deficiency that needs fixing. How about

youshowmea genuinely reductionistic (as opposed to merely descriptive) model of human psychology that's been proposed since Skinner?I only mentioned PCT in this thread in the context of Yvain's request for an example of people making the mistake Richard wrote this post about. And you responded to my criticism of psychology (i.e., it's not a "hard" science) by raising criticisms of PCT that are in fact off-topic to the discussion at hand.

Are you claiming that, if PCT is flawed, then everything in psychology is just jim-dandy fine? Because that's a pretty ludicrous position. Check your logic, and address the topic actually at hand: the complete failure of cognitive-level psychology to come up with a halfway decent reduction of human behavior, instead of just cataloging examples of it.

Otherwise, you are in the exact same position as an intelligent-design advocate pretending that gaps in evolutionary biology mean you don't have to consider the gaps in your own theory, or lack thereof.

Because PCT could be ludicrously

wrong, and it would still be ahugeadvance in the current state of psychology to be able to nail down with any precision why orhowit was wrong.Which is why critique of PCT is irrelevant to this topic: you could disprove PCT utterly, and the given criticism of psychology would still stand, just like disproving evolution wouldn't make "God did it" any more plausible or useful of a theory.

So let's say, for the sake of argument, that I utterly recant of PCT and say it's all gibberish. How would that improve the shoddy state of psychology in the slightest? What would you propose to replace PCT as an actual model of human innards?

Let's hear it. Name for us the

very bestthat modern psychology has given us since Skinner, of any attempt to actually define anexecutablemodel of human behavior. Has anyone eventried, who wasn't an outsider to the field?*1 point [-]A correlation of 0.6 is a bad measurement, period. It does not become a good one for want of a better.

I don't know what you mean by "analysing" a correlation, but this is some of what they did do.

I could have mentioned epidemiology in my intro. The reason it depends on statistics is that it is often much more difficult to discern the actual mechanism of a disease process than to do statistical studies. Googling turns up this study which is claimed (by the scientist doing the work) to be the very first demonstration of a causal link between smoking and lung cancer -- in April of this year (and not the 1st of the month).

But the correlations remain what they are, and it still takes a lot of work to get somewhere with them.

A bad measurement can still be the best there is.

*0 points [-]But it is

useful. I think Yvain asked the wrong question. You can do better than correlations, but do you deny that you can draw from them the conclusions that Yvain does? (ie, the population effect of smoking)The MN scientist is lying. No, I didn't click on the link. Yes, I mean lying, not mistaken.

*0 points [-]The conclusion he draws is:

Sure, standard statistics. No problem, for want of anything better.

On the other hand, if you want to know

howthe link between smoking and lung cancer works, the epidemiology can do no more than suggest places to look.On closer reading, the actual scientific claim is less than I thought. It's a statistical study correlating the presence of a nitrosamine compound in the urine with lung cancer, and finding a higher correlation than with self-reported smoking. Original paper (full text requires subscription) here and blogged here. So just more statistical epidemiology and not at all epoch-making.

ETA: Extra links, just because these things are worth knowing.

As pjeby points out, I gave the examples of "psychology and social science". Look at reports that summarise statistical results by claims of the form "X's are Y's", sometimes by the scientists themselves, not journalists. If you want something more concrete than those generalities, see the context of this comment.

*2 points [-]I'm not sure what you're trying to prove here. I don't think it's fair to compare a correlation coefficient, which gives you a single parameter and can be used without knowledge of the shapes of the underlying distributions, to a confidence interval for X around Z, which gives you 2 parameters, in a situation where the data actually is normally distributed. Furthermore, you are comparing a correlation coefficient of 0.6 to a measurement where Z-X is within 10% of a standard deviation of X! That's

outrageouslyaccurate. For instance, the standard deviation of height of men is 3 inches; so when you want to know someone's height X, you are given a measurement Z that is within 5/16" of X. That's almost within the range of measurement error you would get measuring X directly.Make a comparison where you're given correlations of .67 to 2 independent variables, versus a measurement that gives you 90% confidence of being within 2 standard deviations of the value of X, where Z is represented as being normally distributed around X, but is (unknown to Salviati) highly-skewed around X. I haven't actually worked out the math to see what a fair test would be, but the example written up here is egregiously unfair.

In other words, don't think you're done because you've found a high correlation.

Great write-up.

It would be infuriating to deal with a lazy Bayesian, who refuses to get more data, but only wants to swap priors with you :)

*0 points [-]That... was pretty epic. Instant upvote and thanks. I suddenly feel an urgent need to go change my mind on many real-world issues.