1 min read11th Apr 201318 comments

-7

I found this post very interesting

It's about statistics, causal inference, and 'g'.

New to LessWrong?

New Comment
18 comments, sorted by Click to highlight new comments since: Today at 9:02 PM

Physicist Steve Hsu claims it's very misleading in not discussing extensive empirical research that has falsified the key claims, and links to a lengthy rebuttal.

I made it halfway through the comments thinking this post was about the gravitational constant.

It seems to me that it's fine to attack an existing model; however, you should then present an alternative model that does a better job empirically. I don't think the latter has been accomplished.

thanks for the link.

Not that I feel particularly qualified to judge, but I'd say Dalliard has a way better argument. I wonder if Shalizi has written a response.

It only just came out, but given that in his earlier posts he expressed disgust with the entire field and regretted writing anything on the topic, I wouldn't expect him to.

Moved to Discussion.

Characterizing g as a myth strikes me as pointless. Of course a model is a myth. It still may be a useful one.

What do the believers of g claim they can do with it, and can they?

I think the most convincing thing in Dalliard's critique is the section headed "Shalizi's second error" (the "first error" I think is simply a misreading of Shalizi; the "third error" is part misreading and part just Dalliard and Shalizi being interested in different things). Here, Dalliard says that Shalizi claims the only evidence offered for "g" is (in effect) the pattern of correlation between different test scores, whereas (according to Dalliard) advocates of "g" actually offer a whole lot of stronger evidence: confirmatory (as opposed to exploratory) factor analyses, various genetic investigations, etc.

I don't know enough about any of that stuff to evaluate Dalliard's claims against Shalizi's, though on the face of it it looks as if Shalizi has made a sweeping negative claim that on its face simply doesn't fit the facts -- it would be Shalizi's job to show that the arguments Dalliard points at don't actually support belief in "g", not Dalliard's to show that they do. If anyone reading this is an expert in any of the relevant fields, I would be very interested in their opinion.

this was also the part of Dalliard's critique I found most convincing. Shalizi's argument seems to a refutation of a straw man.

Dalliard writes as if Shalizi is proposing the lots-of-independent-factors model as his best account of what intelligence is actually like:

In contrast, these results are not at all what one would have expected based on the theory of intelligence that Shalizi advocates. According to Shalizi’s model, g factors reflect only the average or sum of the particular abilities called for by a given test battery, with batteries comprising different tests therefore almost always yielding different g factors. (I have more to say about Shalizi’s preferred theory later in this post.)

Here is one thing Shalizi actually writes about this model (emphasis mine):

Now, I don't mean to suggest this model of thousands of IID abilities adding up as a serious depiction of how thought works, or even of how intelligence test scores work. My point, like Thomson's, is to show you that the signs which the g-mongers point to as evidence for its reality, for there having to be a single predominant common cause, actually indicate nothing of the kind.

It seems to me that Dalliard is, at best, not reading Shalizi charitably.

(On the other hand, I would find Shalizi's argument more compelling if he offered a theory that (1) is at least kinda-credible as a model of how thought works, and (2) doesn't have any underlying mechanism resembling "g", and (3) fits the statistical data reasonably well.)

Looking at the last section of Dalliard's critique (which is the one that addresses what I take to be one of the two central points of Shalizi's article, namely that one can get the sorts of correlations used as evidence by "g" theorists even when there is in fact no single common factor) it seems to me that the two of them are rather talking past one another. Or, since Shalizi wrote first and Dalliard second, it seems to me that Dalliard is missing (or doesn't care about) Shalizi's point.

Here's Shalizi:

The mythical aspect of g isn't that it can be defined, or, having been defined, that it describes a lot of the correlations on intelligence tests; the myth is that this tells us anything more than that those tests are positively correlated.

and here's Dalliard:

The question of whether or not there is a unidimensional scale of intelligence along which individuals can be arranged is independent of the question of what the neurobiological substrate of intelligence is like.

To put it differently (and if you're skeptical that this is a fair paraphrase, I suggest you read the bits of Shalizi's and Dalliard's articles from which those quotations come), Shalizi is saying "yeah, sure, you can give people tests of mental functioning and they'll correlate positively, which means you can measure people's 'general intelligence' and get somewhat-consistent results; but that doesn't mean there's any single underlying factor", and Dalliard is saying "yeah, sure, there might not be a single underlying factor, but because of those positive correlations you can measure people's 'general intelligence' and get somewhat-consistent results."

I should add that Dalliard does also offer, or at least gesture towards, some empirical evidence against the sort of no-common-factor model Shalizi is describing. Probably the work by Jensen that Dalliard quotes a few bits from here is more convincing than his quotations on their own; I found the latter vague and handwavy. Can anyone who has Jensen's book tell me whether it offers much actual support for what "it has been noted that"?

One thing Dalliard mentions is that the 'g' derived from different studies are 'statistically indistinguishable'. What's the technical content of this statement?

There is a test to see how similar two factors are. When that test gives results in the >.95 area, the factors are usually taken to be indistinguishable. It's called congruence coefficients. See e.g. Jensen, Arthur R., and Li-Jen Weng. "What is a good g?." Intelligence 18.3 (1994): 231-258.

I only skimmed enough of the article for it to convince me of the opposite of what its author intended.

  • He gives the example of an exercise where he discovered the "g for cars" as being the size of the car, and that all the measurements were correlated with size, mostly positively. Well, gee, size is very much like g for cars. Big cars are bigger than smaller cars, and have more capabilities but need more fuel. Kind of like big brains are smarter than small brains, but need more fuel (across species).

  • He is saying, "g is just this artifact of having a bunch of intelligence tests that are all positively correlated". It's really not an argument at all; it's an attempt to imbue the semantics of factor analysis with negative connotations. Why are all those intelligence tests positively correlated in the first place?

  • For random data points in n dimensions, what fraction of their variance would we expect to be accounted for by the first axis found in factor analysis? That's the real question. But instead of looking at random correlation matrices, he looks at "random" matrices for factors that are all positively correlated! What made them all positively correlated with each other? The chance of even a single pair being positively correlated are only 50%!

This is one of those puzzling cases where the author has a deep understanding of what he is talking about, and yet what he says ... is hopeless, so far from being relevant that it's hard to believe it isn't deliberate deception.

This is one of those puzzling cases where the author has a deep understanding of what he is talking about, and yet what he says ... is hopeless, so far from being relevant that it's hard to believe it isn't deliberate deception.

This is actually relatively common, the word for this is "rationalization".

Dalliard writes:

Shalizi alleges that there are tests that measure intelligence “in the ordinary sense” yet are uncorrelated with traditional tests, but unfortunately he does not gives any examples.

but this appears to be false: I don't find any such allegation in Shalizi's article. Did I miss it, or did Dalliard misread, misunderstand, or (less likely) deliberately misrepresent Shalizi?

These quotes attack psychologists for failing to find such tests, which would be pointless if Shalizi confidently thought there weren't any:

Since intelligence tests are made to correlate with each other, it follows trivially that there must appear to be a general factor of intelligence. This is true whether or not there really is a single variable which explains test scores or not.

The psychologists start with some traits or phenomena, which seem somehow similar to them, to exhibit a common quality, be it "intelligence" or "neuroticism" or "authoritarianism" or what-have-you. The psychologists make up some tests where a high score seems, to intuition, to go with a high degree of the quality. They will even draw up several such tests, and show that they are all correlated, and extract a common factor from those correlations. So far, so good; or at least, so far, so non-circular. This test or battery of tests might be good for something. But now new tests are validated by showing that they are highly correlated with the common factor, and the validity of g is confirmed by pointing to how well intelligence tests correlate with one another and how much of the inter-test correlations g accounts for. (That is, to the extent construct validity is worried about at all, which, as Borsboom explains, is not as much as it should be. There are better ideas about validity, but they drive us back to problems of causal inference.) By this point, I'd guess it's impossible for something to become accepted as an "intelligence test" if it doesn't correlate well with the Weschler and its kin, no matter how much intelligence, in the ordinary sense, it requires, but, as we saw with the first simulated factor analysis example, that makes it inevitable that the leading factor fits well. [13] This is circular and self-confirming, and the real surprise is that it doesn't work better.

Not confidently thinking that there aren't any such tests is not the same thing as alleging that there are such tests. I agree that the first probably applies to Shalizi. Dalliard asserts that the second does, but it doesn't look to me as if Dalliard's assertion is true.