I agree that range restriction is important, and I think a range-restriction story can become basically isomorphic to my post (e.g. "even if something is really strongly correlated, range restricting to the top 1% of this distribution, this correlation is lost in the noise, so it should not surprise us that the biggest X isn't the biggest Y.")
My post might be slightly better for people who tend to visualize things, and I suppose it might have a slight advantage as it might provide an explanation why you are more likely to see this as the number of observations increases, which isn't so obvious when talking about a loss of correlation.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
A better way would be to make the criticisms more concrete. What does "not particularly informative and it isn't scientifically sound at all" mean? You might, for example, have said something to the effect that the ellipses are contours of the bivariate normal distribution with the same correlation, and pointed out that not all bivariate distributions are normal. But on the other hand the scatterplots presented aren't so far away from normal that the ellipses are misleading. The ellipses are indeed intuitive and illustrative; but calling them "just fiction" is another way of expressing criticism too vague to respond to. The point masses and frictionless pulleys of school physics problems are also fictions, but none the worse for that.
This is also vague:
(Where, and what did they say? We cannot know what better resources you know of unless you tell us.)
And this:
There is no "newby section" on LessWrong.
Besides, you're talking there about something you previously called "just wrong". First it's "just wrong", then it's "not particularly informative", then it's "illustrative", then "it has its place in the newby section". It reminds me of the old adage about the stages of truth, with the entire sequence here compressed into a single comment.
What isn't "concrete" about it? I think the whole article is an exercise in stating the obvious, to those who have had basic education in statistics. Stricter correlations tend to be more linear. A broader spectrum of data points is pretty much by definition "fatter". I don't see how this is actually very instructive. And to be honest, I don't see how I could be much more specific.
You mean you've never had a statistics class? Honestly? I'm not trying to be snide, just asking.
Extreme data points are often called "outliers" for a reason. Since (again, almost -- but not quite -- by definition, it depends on circumstances) they do not generally show as strong a correlation, "other factors may weigh more". This is a not a revelation. I don't disagree with it, I'm simply saying it's rather elementary logic.
Which brings us back to the main point I was making: I did not feel this was particularly instructive.
Wrong in the sense that I don't see any actual demonstrated relationship between his ellipses and the data, except for simple, rather intuitive observation. It's merely an illustrative tool. More specifically:
This is an incorrect statement. What he is offering is a way to describe how data at the extreme ends may vary from correlation. Not "why". There is nothing here establishing causation.
If we are to be "less wrong", then we should endeavor to not make confused comments like that.