Scrutinize claims of scientific fact in support of opinion journalism.

Even with honest intent, it's difficult to apply science correctly, and it's rare that dishonest uses are punished. Citing a scientific result gives an easy patina of authority, which is rarely scratched by a casual reader. Without actually lying, the arguer may select from dozens of studies only the few with the strongest effect in their favor, when the overall body of evidence may point at no effect or even in the opposite direction. The reader only sees "statistically significant evidence for X". In some fields, the majority of published studies claim unjustified significance in order to gain publication, inciting these abuses.

Here are two recent examples:

Women are often better communicators because their brains are more networked for language. The majority of women are better at "mind-reading," than most men; they can read the emotions written on people's faces more quickly and easily, a talent jump-started by the vast swaths of neural real estate dedicated to processing emotions in the female brain.

- Susan Pinker, a psychologist, in NYT's "DO Women Make Better Bosses"

Twin studies and adoptive studies show that the overwhelming determinant of your weight is not your willpower; it's your genes. The heritability of weight is between .75 and .85. The heritability of height is between .9 and .95. And the older you are, the more heritable weight is.

- Megan McArdle, linked from the LW article The Obesity Myth


Mike, a biologist, gives an exasperated explanation of what heritability actually means:

Quantitative geneticists use [heritability] to calculate the changes to be expected from artificial or natural selection in a statistically steady environment. It says nothing about how much the over-all level of the trait is under genetic control, and it says nothing about how much the trait can change under environmental interventions.

Susan Pinker's female-boss-brain cheerleading is refuted by Gabriel Arana. A specific scientific claim Pinker makes ("the thicker corpus callosum connecting women's two hemispheres provides a swifter superhighway for processing social messages") is contradicted by a meta-analysis (Sex Differences in the Human Corpus Callosum: Myth or Reality?), and without that, you have only just-so evolutionary psychology argument.

The Bishop and Wahlsten meta-analysis claims that the only consistent finding is for slightly larger average whole brain size and a very slightly larger corpus callosum in adult males. Here are some highlights:

Given that the CC interconnects so many functionally different regions of cerebral cortex, there is no reason to believe that a small difference in overall CC size will pertain to any specific psychological construct. Total absence of the corpus callosum tends to be associated with a ten-point or greater reduction in full-scale IQ, but more specific functional differences from IQ-matched controls are difficult to identify.
In one recent study, a modest correlation between cerebrum size and IQ within a sex was detected. At the same time, males and females differ substantially in brain size but not IQ. There could easily be some third factor or array of processes that acts to increase both brain size and IQ score for people of the same sex, even though brain size per se does not mediate the effect of the other factor on IQ.
The journal Science has refused to publish failures to replicate the 1982 claims of de Lacoste-Utamsing and Holloway (Byne, personal communication).

Obviously, if journals won't publish negative results, then this weakens the effective statistical significance of the positive results we do read. The authors don't find this to be significant for the topic (the above complaint isn't typical).

When many small-scale studies of small effects are published, the chances are good that a few will report a statistically significant sex difference. ... One of our local newspapers has indeed printed claims promulgated over wire services about new studies finding a sex difference in the corpus callosum but has yet to print a word about contrary findings which, as we have shown, far outnumber the statistically significant differences.

This effect is especially notable in media coverage of health and diet research.

The gold-standard in the medical literature is a cumulative meta-analysis conducted using the raw data. We urge investigators to make their raw data or, better yet, the actual tracings available for cumulative meta-analysis. We attempted to collect the raw data from studies of sex differences in the CC cited in an earlier version of this paper by writing to the authors. The level of response was astoundingly poor. In several studies that used MRI, the authors even stated that the original observations were no longer available.

This is disturbing. I suspect that many authors are hesitant to subject themselves to the sort of scrutiny they ought to welcome.

By convention, we are taught that the null hypothesis of no sex difference should be rejected if the probability of erroneously rejecting the null on the basis of a set of data is 5% or less. If 10 independent measures are analysed in one study, each with the α = 0.05 criterion, the probability of finding at least one ‘significant’ sex difference by chance alone is 1 − (1 − 0.05)10 = 0.40 or 40%. Consequently, when J tests involving the same object, e.g. the corpus callosum, are done in one study, the criterion for significance of each test might better be adjusted to α/J, the Dunn or Bonferroni criterion that is described in many textbooks. All but two of 49 studies of the CC adopted α = 0.05 or even 0.10, and for 45 of these studies, an average of 10.2 measures were assessed with independent tests.

This is either rank incompetence, or even worse, the temptation to get some positive result out of the costly data collection.

New Comment
38 comments, sorted by Click to highlight new comments since:

I suspect that many authors are hesitant to subject themselves to the sort of scrutiny they ought to welcome.

Normative language ("ought") is not helpful here. Journals that nominally require publication of data or calculations don't enforce it, either.

One way to deal with selection bias and fraud that I have occasionally seen, and only in economics and parapsychology ("the control group for science"), is to compare the effect size to the study size. If it's a real effect, it will not depend on the study size. But if it's fake, it will always just barely be statistically significant and thus it will decline with study size.

This kind of meta-analysis come from not trusting one's peers. This is rude, hence rare. But it's a lot more useful than pooling the data, the usual meta-analysis.

The obvious solution, IMO, is to have journals approve study designs for publication in advance, including all statistical tools to be used; and then you do the study and run the preselected analysis and publish the results, regardless of whether positive or negative.

But just like many other obvious improvements we can all think of to the process of science, this one will not be carried out.

parapsychology ("the control group for science")

Did you get that off me? I was planning a post on it at some point or another.

That's the obvious brute force solution, but a possibly more elegant route is just to have an international trials register. This suggestion has been around for a while, and should be significantly less costly (and controversial) than the pre-commit to publishing route while still giving some useful tools for checking on things like publication bias, double publication, etc.

But just like many other obvious improvements we can all think of to the process of science, this one will not be carried out.

To a certain extent, it is being carried out for drug studies, but it requires centralization. At least, various central authorities have promised to require some pre-registration, but they may fail, as in the data availability story. Individuals can do meta-analyses that are skeptical of the original publications. and they do, on special occasions.

I think I've heard the line about parapsychology as a joke in a number of places, but I heard it seriously from Vassar.

have journals approve study designs for publication in advance, including all statistical tools to be used; and then you do the study and run the preselected analysis and publish the results, regardless of whether positive or negative

Brilliant.

Maybe a notary service for such plans would become popular from the ground up. Of course, to get voluntary adoption, you'd have to implement a guarantee of secrecy for a desired time period (even though the interests of science would be best served by early publicity, those scientists want their priority).

Let's see, just the right protocol for signing/encrypting, and ... never mind, it will never be used until some high status scientists want to show off ;)

Parapsychology: The control group for science.

Excellent quote. May I steal it?

It's too good to ask permission for. I'll wait to get forgiveness ;).

Michelle Malkin is not Megan McArdle. All of their names begin with the letter "M" though.

McArdle also presents a false dichotomy between willpower and genes. Your genes presumably have a lot to do with your willpower (which is not to say that it's highly heritable, though in fact it may be). Perhaps she has not fully chucked the "ghost in the machine" and thinks willpower is the real, internal you as opposed to that genetic stuff from your parents.

Corrected. Whoops.

Not to defend dishonest interpretations of science here, but... "heritability" sounds like a unfortunate choice of word for the concept described. It invites inadvertent misrepresentations.

I'm reminded of an old OB comment by Anatoly Vorobey that made the reasonable point that Kolmogorov complexity captures the human notion of "complexity" very lousily at best. (WTF, the whole universe is less complex than one planet within it?) So too it seems with "heritability". People clearly want a number that would describe "how much the over-all level of the trait is under genetic control, and... how much the trait can change under environmental interventions" - why can't the biologists just give them that?

Because there is no such number. More variance in the environment will mean "less heritability".

Fix: "...under the strongest environmental interventions known today".

Have fun trying to define what is accepted as an "environmental intervention" and what isn't.

(Getting your head smashed in with a hammer will end up reducing your body weight rather quickly, so going by your suggestion obesity is 0% heritable.)

How about "malleability"? Obesity is malleable either way (overeating, liposuction). IQ is highly malleable downwards (hammer to head), not so much upwards (a year of schooling gives +2 points). Eye color, 0% malleable. Maybe take a derivative in effort/time/money to change a trait in the desired direction.

This will be both more useful socially and vastly easier to estimate than "heritability", if we accept Shalizi's proof that "heritability" is almost impossible to measure. By the way, the original post relies upon that proof.

Let's call it 'genetic determinism'.

Re: WTF, the whole universe is less complex than one planet within it?

Possible, but we don't know if that's true or not: the Kolmogorov complexity of the universe is not known.

This would be easier to follow and more enjoyable to read if you stated what point you're trying to make, preferably before the large quotations.

You're right. I've made an attempt at editing the intro in that direction, which may be somewhat lame, as I had no point to make originally :)

I originally intended only to share the two examples without commentary, but as I examined the meta-analysis, it occurred to me that not everyone would have access to the full text.

I think it's a big improvement, thanks.

Note that McArdle responds in the comments:

Reread the post. I did not say that environment wasn't interacting with genes--indeed, that was the entire purpose of the height comparison. I was responding to people who claim that individual outcomes can't be rooted in genetics because after all, there were no fat people in Auschwitz, plus we're all getting fatter. I understand heritability quite well, thanks--or at least, I already knew everything you wrote before you wrote it, and nothing in my post contradicts it.

(Not saying anything about whether her defence is right or wrong, just pointing it out.)

Interesting. I don't believe her. I think her purpose was to suggest that weight is nearly as immutable under changes in diet and exercise as height.

Well, average height is also increasing in the population. Does that mean that you could be as tall as me, if you weren't too lazy to grow?

Twin studies and adoptive studies show that the overwhelming determinant of your weight is not your willpower; it's your genes. The heritability of weight is between .75 and .85. The heritability of height is between .9 and .95.

On the other hand, I do think there's far more to appetite and obesity than willpower[1].

Are there people with genetics such that, had they been given a diabetes+obesity inducing diet as children, they would still be rail thin and fidgety, burning tons of calories without explicit exercise? I think there are.

But I expect interaction between environment and genes to be very high in obesity[2], so heritability can't be used on its own to draw that conclusion.

[1] "studies of monozygotic and dizygotic twins have unambiguously shown that there is a much greater resemblance in the degree of obesity between genetically identical monozygotic twins" - http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1119832 - this is evidence for genetic variation in obesity with nearly constant environmental factors (clearly the availability of calories is a prerequisite for obesity).

[2] Extensive fictional evidence exists in the rotund mother who tries to fatten up her offspring when they return for the holidays: "put some meat on your bones!". Seriously,

[-]taw10

This misuse of statistical significance is a standard practice in science. At least they have decency to tell you about insignificant results, as opposed to sweeping them under the carpet.

This pretend significance, while mathematically false, isn't all that bad. With the bias journals have against publishing research in support of null hypotheses, very little would get published - or more likely some other tricks would be invented as bad as this one. And no matter how much you abused statistical significance by doing multiple independent tests, if someone repeats your study and finds the same result, it is honestly statistically significant then.

An article in Nature appears to contradict McArdle's claim that "the older you are, the more heritable weight is":

These findings suggest that adult body size, shape, and composition are highly heritable in both women and men, although a decreasing tendency is seen with advancing age.

I don't think this is McArdle's fault; there happen to be contradictory findings which have probably reported excessive confidence. An increase in heritability means that the genes really expressed themselves more strongly, or variation in environment decreased (or both, obviously). Looking at the reported research myself, I can only think "it's not safe to say whether age will result in more (or less) similar weights amongst genetically similar individuals".

This makes me wonder how much science will advance when it can be done by amateurs with better policies. (scrutiny)

Amateur astronomers make discoveries because telescopes are cheap, and findings can be easily corroborated online.

I'm sure people will find interesting things with cheap ultrasounds.

Why isn't there more amateur computer science and math then?

Why isn't there more amateur computer science [...]?

GNU?

I think there's plenty of it. There's people making their own rendering engines, playing with game AI's, experimenting with novel modes of human interaction (wiimotes,multitouch,iphone, openCV), evolving compressed paintings in javascript, and woking on distrubuted/parallelized computing.

It's not all formal but a lot of it is open, and it occasionally explodes into something useful.

But people who are good at programming/research have a tendency to get hired to do it professionally, so there is little point in the status of productive amateur (in the large scheme of things), except, perhaps, easier entry to these fields.

Quantitative geneticists use [heritability] to calculate the changes to be expected from artificial or natural selection in a statistically steady environment. It says nothing about how much the over-all level of the trait is under genetic control, and it says nothing about how much the trait can change under environmental interventions.

I don't think that's right. The term "heritability" is used in twin studies, which do not involve a steady environment, and which are all about how much the trait is under genetic control.

Have you actually read the linked-to article? Heritability != genetic control. The textbook example:

The textbook example is that (essentially) all of the variance in the number of eyes, hearts, hands, kidneys, heads, etc. people have is environmental. (There are very, very few mutations which alter how many eyes people have, because those are strongly selected against, but people do lose eyes to environmental causes, such as accident, disease, torture, etc.) The heritability of these numbers is about as close to zero as possible, but the genetic control of them is about as absolute as possible.

That text is actually a quote from here, and that article is even more interesting and explicit on this point.

Okay, I'm reading the article now. I am no expert in this area, but it seems to just be wrong.

First, it is patently false that "heritability says nothing about how much the over-all level of the trait is under genetic control." Heritability is defined in a way that is designed to tell you how much of the trait is under genetic control. That's its purpose. It's not a perfect measure, but it's wrong to say that it tells you nothing about what it's designed to tell you something about.

I expect the textbook example of heritability of number of arms being misleading is a textbook example of when heritability measurements go wrong, not a textbook example of what heritability is supposed to measure.

The author's argument is that heritability is variance associated with different genotypes over total variance; all members of the population have different genes; therefore, everything has 100% heritability. Furthermore, the author goes on to say, there are interactions between genetics and environment, and other factors that are correlated with genetics, and so your heritability measurement isn't meaningful anyway.

This is wrong, for several reasons:

  • It would require psychologists to sequence the DNA of their subjects.

  • If it were correct, psychologists would eventually notice that everything had 100% heritability.

  • Psychologists design experiments measuring heritability so that some pairs in the population share more genes than other pairs.

  • Psychologists design experiments to try to control for those other factors correlated with genetics. If they don't, that's a design flaw.

I don't think the author is really saying that people are misunderstanding the technical definition of 'heritability'. He is saying that all of the studies of IQ have been poorly designed, and so didn't measure actual heritability.

The web page linked to seems to be politically-motivated, to show that IQ is not genetic. I also note that I read half of the book he refers to, which was written in response to The Bell Curve, and as science it was a lousy book. My recollection is that it was long on moralizing and attempts to create associations between The Bell Curve and Bad Things; but was not good at finding errors in the book it condemned so vigorously. It was also motivated by the same politics. It reminded me of what Einstein said when 30 Nazi scientists wrote a book against Relativity: "If they had been right, it would have taken only one scientist."

Godwin's Law! I win!

I think I can even call "large group of eminent scientists write a politically-motivated but scientifically weak book refuting another book" a trope, since the same thing happened with the "Against Sociobiology" letter of Gould etc.

I don't need to read the linked-to article, as I've read other articles using the term "heritability".

Wikipedia says: "In genetics, Heritability is the proportion of phenotypic variation in a population that is attributable to genetic variation among individuals." It defines it as

heritability^2 = variance due to genes / variance in the population

heritability : genetic control :: correlation : causation

That's a partly-valid analogy, because things other than genetic control can cause high heritability measurements. But I don't think it's a strong analogy. You can't say, "Well, I might have the interpretation in the completely wrong direction here; the phenotypes might be controlling the genes."

Heritability is unary. Correlation is binary (I'm talking about arity, not domain). You shouldn't "wrong direction" on a unary relation, but I guess that's just another reason I shouldn't have put that in the form of an analogy. I see that you're taking "heritability(trait) X" as "causes(gene-variance,trait-variance) X". That's definitely not what I intended.

I certainly wasn't trying to convince anyone of "heritability is nonsense!". According to Wikipedia, it seems that narrow-sense heritability, with gene-environment correlation removed, would be a direct indication of "genetic variation causes phenotypic variation" (within a framework of simple linear combination of each gene, and environment). I don't know how to tell if someone has actually obtained this number properly, though.

I preferred the examples to their critiques.

The critiques are both just two unrelated complaints I stumbled upon today. They're not exceptional. But I do think the two examples of science in service of opinion journalism deserve criticism.