Or alternately, it's a large effect but the rarity of autism and of non-vaccinated kids makes it hard to reach statistical-significance given sampling error. So let's see, the suggestion here is that the reason so few studies threw up a false positive was that the true effect was the opposite of the alternative, vaccines reduce autism.
Autism is... what, 0.5% of the general population of kids these days? And unvaccinated kids are, according to a random Mother Jones article, ~1.8%.
So let's imagine that vaccines halve the risk of autism down from the true 1.0% to the observed 0.5% (halving certainly seems like a 'large' effect to me), autism has the true base rate of 1.0% in unvaccinated, and the unvaccinated make up 1.8% of the population. If we randomly sampled the population in general, how much would we have to sample in order to detect a difference in autism rates between the vaccinated & unvaccinated?
The regular R function I'd use for this, power.prop.test, doesn't work since it assumes balanced sample sizes, not 1.8% in one group and 98.2% in the other. I could write a simulation to do the power calculation for a prop.test since the test itself handles imbalanced sample sizes, but then I googled and found someone had written something very similar for the Wilcoxon u-test, so hey, I'll use the samplesize library instead; filling in the relevant values, we find for a decent chance of detecting such a correlation of vaccination with reduced autism, it takes:
R> library(samplesize)
R> n.wilcox.ord(power = 0.8, alpha = 0.05, t = 0.018, c(0.005,0.995), c(0.010,0.990))
$`total sample size`
[1] 89947
$m
[1] 88328
$n
[1] 1619
a total n=90k. I'm guessing that most studies don't get near that.
Of course, a lot of that penalty is going towards picking up enough kid who are both autistic and unvaccinated, so one could do better by trying to preferentially sample either of those groups, but then one gets into thorny questions about whether one's convenience samples are representative and biased in some way...
As the original article says, if there was no effect, you'd expect a few studies to get p < 0.05 by chance. Similarly, if there was no effect, you'd expect a few studies to get p > 0.95 by chance, suggesting that vaccines prevent autism. If vaccines do prevent autism, then it would be even more likely to have p > 0.95.
A friend recently posted a link on his Facebook page to an informational graphic about the alleged link between the MMR vaccine and autism. It said, if I recall correctly, that out of 60 studies on the matter, not one had indicated a link.
Presumably, with 95% confidence.
This bothered me. What are the odds, supposing there is no link between X and Y, of conducting 60 studies of the matter, and of all 60 concluding, with 95% confidence, that there is no link between X and Y?
Answer: .95 ^ 60 = .046. (Use the first term of the binomial distribution.)
So if it were in fact true that 60 out of 60 studies failed to find a link between vaccines and autism at 95% confidence, this would prove, with 95% confidence, that studies in the literature are biased against finding a link between vaccines and autism.
In reality, you should adjust your literature survey for known biases of literature. Scientific literature has publication bias, so that positive results are more likely to be reported than negative results.
They also have a bias from errors. Many articles have some fatal flaw that makes their results meaningless. If the distribution of errors is random, I think--though I'm not sure--that we should assume this bias causes regression towards an equal likelihood of positive and negative results.
Given that both of these biases should result, in this case, in more positive results, having all 60 studies agree is even more incredible.
So I did a quick mini-review this morning, looking over all of the studies cited in 6 reviews on the results of studies on whether there is a connection between vaccines and autism:
National Academies Press (2004). Immunization safety review: Vaccines and autism.
National Academies Press (2011). Adverse effects of vaccines: Evidence and causality.
American Academy of Pedatricians (2013): Vaccine safety studies.
The current AAP webpage on vaccine safety studies.
The Immunization Action Coalition: Examine the evidence.
Taylor et al. (2014). Vaccines are not associated with autism: an evidence-based meta-analysis of case-control and cohort studies. Vaccine Jun 17;32(29):3623-9. Paywalled, but references given here.
I listed all of the studies that were judged usable in at least one of these reviews, removed duplicates, then went through them all and determined, either from the review article or from the study's abstract, what it concluded. There were 39 studies used, and all 39 failed to find a connection between vaccines and autism. 4 studies were rejected as methodologically unsound by all reviews that considered them; 3 of the 4 found a connection.
(I was, as usual, irked that if a study failed to prove the existence of a link given various assumptions, it was usually cited as having shown that there was no link.)
I understand that even a single study indicating a connection would immediately be seized on by anti-vaccination activists. (I've even seen them manage to take a study that indicated no connection, copy a graph in that study that indicated no connection, and write an analysis claiming it proved a connection.) Out there in the real world, maybe it's good to suppress any such studies. Maybe.
But here on LessWrong, where our job is not physical health, but mental practice, we shouldn't kid ourselves about what the literature is doing. Our medical research methodologies are not good enough to produce 39 papers and have them all reach the right conclusion. The chances of this happening are only .95 ^ 39 = 0.13, even before taking into account publication and error bias.
Note: This does not apply in the same way to reviews that show a link between X and Y
If the scientific community felt compelled to revisit the question of whether gravity causes objects to fall, and conducted studies using a 95% confidence threshold comparing apples dropped on Earth to apples dropped in deep space, we would not expect 5% of the studies to conclude that gravity has no effect on apples. 95% confidence means that, even if there is no link, there's a 5% chance the data you get will look as if there is a link. It does not mean that if there is a link, there's a 5% chance the data will look as if there isn't. (In fact, if you're wondering how small studies and large studies can all have 95% confidence, it's because, by convention, the extra power in large studies is spent on being able to detect smaller and smaller effects, not on higher and higher confidence that a detected effect is real. Being able to detect smaller and smaller effects means having a smaller and smaller chance that, if there is an effect, it will be too small for your study to detect. Having "95% confidence" tells you nothing about the chance that you're able to detect a link if it exists. It might be 50%. It might be 90%. This is the information black hole that priors disappear into when you use frequentist statistics.)
Critiquing bias
One plausible mechanism is that people look harder for methodological flaws in papers they don't like than in papers that they like. If we allowed all 43 of the papers, we'd have 3 / 43 finding a link, which would still be surprisingly low, but possible.
To test this, I looked at Magnuson 2007, "Aspartame: A Safety Evaluation Based on Current Use Levels, Regulations, and Toxicological and Epidemiological Studies" (Critical Reviews in Toxicology,37:629–727). This review was the primary--in fact, nearly the only--source cited by the most-recent FDA review panel to review the safety of aspartame. The paper doesn't mention that its writing was commissioned by companies who sell aspartame. Googling their names revealed that at least 8 of the paper's 10 authors worked for companies that sell aspartame, either at the time that they wrote it, or shortly afterwards.
I went to section 6.9, "Observations in humans", and counted the number of words spent discussing possible methodological flaws in papers that indicated a link between aspartame and disease, versus the number of words spent discussing possible methodological flaws in papers that indicated no link. I counted only words suggesting problems with a study, not words describing its methodology.
224 words were spent critiquing 55 studies indicating no link, an average of 4.1 words per study. 1375 words were spent critiquing 24 studies indicating a link, an average of 57.3 words per study.
(432 of those 1375 words were spent on a long digression arguing that formaldehyde isn't really carcinogenic, so that figure goes down to only 42.9 words per positive-result study if we exclude that. But that's... so bizarre that I'm not going to exclude it.)