V_V comments on Too good to be true - LessWrong

24 Post author: PhilGoetz 11 July 2014 08:16PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (119)

You are viewing a single comment's thread. Show more comments above.

Comment author: V_V 12 July 2014 08:47:34PM 1 point [-]

I'm going to look for an effect so large that, if there is no link between X and Y, the data will conspire against me only 5% of the time to look as if there is.

I think it is "only at most 5% of the time".

Comment author: Douglas_Knight 12 July 2014 09:21:48PM 4 points [-]

No, we are choosing the effect size before we do the study. We choose it so that if the true effect is zero, we will have a false positive exactly 5% of the time.

Comment author: jbay 17 July 2014 12:39:12PM *  2 points [-]

How does this work for a binary quantity?

If your experiment tells you that [x > 45] with 99% confidence, you may in certain cases be able to confidently transform that to [x > 60] with 95% confidence.

For example, if your experiment tells you that the mass of the Q particle is 1.5034(42) with 99% confidence, maybe you can say instead that it's 1.50344(2) with 95% confidence.

If your experiment happens to tell you that [particle Q exists] is true with 99% confidence, what kind of transformation can you apply to get 95% confidence instead? Discard some of your evidence? Add noise into your sensor readings?

Roll dice before reporting the answer?

Comment author: Douglas_Knight 17 July 2014 02:56:56PM 1 point [-]

We're not talking about a binary quantity.

Comment author: V_V 14 July 2014 10:46:51AM 2 points [-]

According to Wikipedia:

In statistical significance testing, the p-value is the probability of obtaining a test statistic result at least as extreme as the one that was actually observed, assuming that the null hypothesis is true.[1][2] A researcher will often "reject the null hypothesis" when the p-value turns out to be less than a predetermined significance level, often 0.05[3][4] or 0.01.

Comment author: Cyan 14 July 2014 02:23:29PM 2 points [-]

You want size, not p-value. The difference is that size is a "pre-data" (or "design") quantity, while the p-value is post-data, i.e., data-dependent.

Comment author: V_V 20 July 2014 03:36:19PM 2 points [-]

Thanks.

So if I set size at 5%, collect the data, and run the test, and repeat the whole experiment with fresh data multiple times, should I expect that, if the null hypothesis is true, the test accepts exactly %5 of times, or at most 5% of times?

Comment author: Cyan 20 July 2014 04:10:27PM 2 points [-]

If the null hypothesis is simple (that is, if it picks out a single point in the hypothesis space), and the model assumptions are true blah blah blah, then the test (falsely) rejects the null with exactly 5% probability. If the null is composite (comprises a non-singleton subset of parameter space), and there is no nice reduction to a simple null via mathematical tricks like sufficiency or the availability of a pivot, then the test falsely rejects the null with at most 5% probability.

But that's all very technical; somewhat less technically, almost always, a bootstrap procedure is available that obviates these questions and gets you to "exactly 5%"... asymptotically. Here "asymptotically" means "if the sample size is big enough". This just throws the question onto "how big is big enough," and that's context-dependent. And all of this is about one million times less important than the question of how well each study addresses systematic biases, which is an issue of real, actual study design and implementation rather than mathematical statistical theory.

Comment author: Douglas_Knight 14 July 2014 02:32:00PM 3 points [-]

Quoting authorities without further commentary is a dick thing to do. I am going to spend more words speculating about the intention of the quote than are in the quote, let alone that you bothered to type.

I have no idea what you think is relevant about that passage. It says exactly what I said, except transformed from the effect size scale to the p-value scale. But somehow I doubt that's why you posted it. The most common problem in the comments on this thread is that people confuse false positive rate with false negative rate, so my best guess is that you are making that mistake and thinking the passage supports that error (though I have no idea why you're telling me). Another possibility, slightly more relevant to this subthread, is that you're pointing out that some people use other p-values. But in medicine, they don't. They almost always use 95%, though sometimes 90%.

Comment author: V_V 20 July 2014 03:37:02PM 1 point [-]

My confusion is about "at least" vs. "exactly". See my answer to Cyan.