Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: paper-machine 23 July 2014 04:35:38PM 3 points [-]

The title's a lot funnier if you s/as/of/.

Comment author: PhilGoetz 23 July 2014 04:45:16PM 0 points [-]

Agreed. Done. Thanks!

Fifty Shades of Self-Fulfilling Prophecy

7 PhilGoetz 23 July 2014 04:27PM

The official story: "Fifty Shades of Grey" was a Twilight fan-fiction that had over two million downloads online. The publishing giant Vintage Press saw that number and realized there was a huge, previously-unrealized demand for stories like this. They filed off the Twilight serial numbers, put it in print, marketed it like hell, and now it's sold 60 million copies.

The reality is quite different.

continue reading »
Comment author: Cyan 12 July 2014 03:00:57AM 9 points [-]

No; it's standard to set the threshold for your statistical test for 95% confidence. That's its statistical power.

"Power" is a statistical term of art, and its technical meaning is neither 1 - alpha nor 1 - p.

In response to comment by Cyan on Too good to be true
Comment author: PhilGoetz 12 July 2014 02:49:36PM *  2 points [-]

Oops; you're right. Careless of me; fixed.

In response to Too good to be true
Comment author: kilobug 12 July 2014 02:17:33PM 5 points [-]

I don't think the "95% confidence" works that way. It's a lower bound, you never try to publish anything with a lower than 95% confidence (and if you do, your publication is likely to be rejected), but you don't always need to have exactly 95% (2 sigma).

Hell, I play enough RPGs to know that rolling 1 or 20 in a d20 is frequent enough ;) 95% is quite low confidence, it's really a minimum at which you can start working, but not something optimal.

I'm not sure exactly in medicine, but in physics it's frequent to have studies at 3 sigma (99.7%) or higher. The detection of the Higgs boson by the LHC for example was done within 5 sigma (one chance in a million of being wrong).

Especially in a field with high risk of data being abused by ill-intentioned people such as "vaccine and autism" link, it would really surprise me that everyone just kept happily the 95% confidence, and didn't aim for much higher confidence.

Comment author: PhilGoetz 12 July 2014 02:45:45PM 6 points [-]

Especially in a field with high risk of data being abused by ill-intentioned people such as "vaccine and autism" link, it would really surprise me that everyone just kept happily the 95% confidence, and didn't aim for much higher confidence.

Okay. Be surprised. It appears that I've read hundreds of medical journal articles and you haven't.

Medicine isn't like physics. The data is incredibly messy. High sigma results are often unattainable even for things you know are true.

In response to Too good to be true
Comment author: Manfred 12 July 2014 12:50:51AM 3 points [-]

First thing, if you put something in your body, it has some effect, even if that effect is small. "No effect" results just rule out effects above different effect sizes (both positive and negative) with high probability, and there's no point talking about "a link" like it's some discrete thing (you sort of jump back and forth between getting this one right and wrong).

Second, different studies will rule out different effect sizes with 95% confidence - or to put it another way, at a given effect size, different studies will have different p-values, and so your probability exercise was pretty pointless because you didn't compare the studies' opinions about any particular effect size, just "whatever was 95%."

Third, I'd bet a nickel the effect sizes ruled out at 95% in all of these studies are well below the point where it would become concerning (like, say, the effect of the parents being a year older). That is, these studies all likely rule out a concerning effect size with probability much better than 95%.

Comment author: PhilGoetz 12 July 2014 01:38:10AM 2 points [-]

so your probability exercise was pretty pointless because you didn't compare the studies' opinions about any particular effect size

My probability exercise was not about effect size. It was about the probability of all studies agreeing by chance if there is in fact no link, and so the 95% confidence is what is relevant.

Third, I'd bet a nickel the effect sizes ruled out at 95% in all of these studies are well below the point where it would become concerning (like, say, the effect of the parents being a year older). That is, these studies all likely rule out a concerning effect size with probability much better than 95%.

Again, not relevant to the point I'm making here.

In response to Too good to be true
Comment author: benkuhn 12 July 2014 01:14:35AM *  5 points [-]

In your "critiquing bias" section you allege that 3/43 studies supporting a link is "still surprisingly low". This is wrong; it is actually surprisingly high. If B ~ Binom(43, 0.05), then P(B > 2) ~= 0.36.*

*As calculated by the following Python code:

from scipy.stats import binom
b = binom(43, 0.05)
p_less_than_3 = sum(b.pmf(i) for i in [0,1,2])
print 1 - p_less_than_3
Comment author: PhilGoetz 12 July 2014 01:33:44AM 0 points [-]

I said "surprisingly low" because of publication & error bias.

Comment author: Mass_Driver 11 July 2014 10:10:21PM 3 points [-]

I'm confused about how this works.

Suppose the standard were to use 80% confidence. Would it still be surprising to see 60 of 60 studies agree that A and B were not linked? Suppose the standard were to use 99% confidence. Would it still be surprising to see 60 of 60 studies agree that A and B were not linked?

Also, doesn't the prior plausibility of the connection being tested matter for attempts to detect experimenter bias this way? E.g., for any given convention about confidence intervals, shouldn't we be quicker to infer experimenter bias when a set of studies conclude (1) that there is no link between eating lithium batteries and suffering brain damage vs. when a set of studies conclude (2) that there is no link between eating carrots and suffering brain damage?

Comment author: PhilGoetz 11 July 2014 10:33:41PM *  5 points [-]

"95% confidence" means "I am testing whether X is linked to Y. I know that the data might randomly conspire against me to make it look as if X is linked to Y. I'm going to look for an effect so large that, if there is no link between X and Y, the data will conspire against me only 5% of the time to look as if there is. If I don't see an effect at least that large, I'll say that I failed to show a link between X and Y."

If you went for 80% confidence instead, you'd be looking for an effect that wasn't quite as big. You'd be able to detect smaller clinical effects--for instance, a drug that has a small but reliable effect--but if there were no effect, you'd be fooled by the data 20% of the time into thinking that there was.

Also, doesn't the prior plausibility of the connection being tested matter for attempts to detect experimenter bias this way?

It would if the papers claimed to find a connection. When they claim not to find a connection, I think not. Suppose people decided to test the hypothesis that stock market crashes are caused by the Earth's distance from Mars. They would gather data on Earth's distance from Mars, and on movements in the stock market, and look for a correlation.

If there is no relationship, there should be zero correlation, on average. That (approximately) means that half of all studies will show a negative correlation, and half will have positive correlation.

They need to pick a number, and say that if they find a positive correlation above that number, they've proven that Mars causes stock market crashes. And they pick that number by finding the correlation just exactly large enough that, if there is no relationship, it happens 5% of the time by chance.

If the proposition is very very unlikely, somebody might insist on a 99% confidence interval instead of a 95% confidence interval. That's how prior plausibility would affect it. Adopting a standard of 95% confidence is really a way of saying we agree not to haggle over priors.

In response to comment by dvasya on Too good to be true
Comment author: dvasya 11 July 2014 08:26:59PM 2 points [-]

Also, different studies have different statistical power, so it may not be OK to simply add up their evidence with equal weights.

In response to comment by dvasya on Too good to be true
Comment author: PhilGoetz 11 July 2014 09:18:28PM *  0 points [-]

No; it's standard to set the threshold for your statistical test for 95% confidence. Studies with larger samples can detect smaller differences between groups with that same statistical power.

Too good to be true

20 PhilGoetz 11 July 2014 08:16PM

A friend recently posted a link on his Facebook page to an informational graphic about the alleged link between the MMR vaccine and autism. It said, if I recall correctly, that out of 60 studies on the matter, not one had indicated a link.

Presumably, with 95% confidence.

This bothered me. What are the odds, supposing there is no link between X and Y, of conducting 60 studies of the matter, and of all 60 concluding, with 95% confidence, that there is no link between X and Y?

Answer: .95 ^ 60 = .046. (Use the first term of the binomial distribution.)

So if it were in fact true that 60 out of 60 studies failed to find a link between vaccines and autism at 95% confidence, this would prove, with 95% confidence, that studies in the literature are biased against finding a link between vaccines and autism.

continue reading »
Comment author: Eugine_Nier 11 April 2013 04:44:35AM *  3 points [-]

For example, in messy topics like biology, most instances of "all" should be replaced with "most". In other words, people were translating the universal statements into probabilistic statements. They were subsequently confused when you insisted on treating the problem as logical rather than statistical.

Comment author: PhilGoetz 08 July 2014 04:58:41PM *  -1 points [-]

It is because it is a statistical problem that you can't replace "all" with "most". The F-value threshold was calculated assuming "all", not "most". You'd need a different threshold if you don't mean "all".

Also, the people I am complaining about explicitly use "all" when they interpret medical journal articles in which a test for an effect was failed as having proven that the effect does not exist for any patients.

View more: Next