Vaniver comments on Welcome to Less Wrong! (5th thread, March 2013) - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (1750)
I feel like this could use a bit longer explanation, especially since I think you're not hearing Lumifer's point, so let me give it a shot. (I'm not sure a see a meaningful difference between base rate neglect and selection bias in this circumstance.)
The word "grok" in Viliam_Bur's comment is really important. This part of the grandparent is true:
But it's like saying "well, assume the diagnosis is correct. Then the treatment will make the patient better with high probability." While true, it's totally out of touch with reality- we can't assume the diagnosis is correct, and a huge part of being a doctor is responding correctly to that uncertainty.
Earlier, Lumifer said this, which is an almost correct explanation of using Bayes in this situation:
The part that makes it the "almost" is the "5% of the times, more or less." This implies that it's centered around 5%, with random chance determining what this instance is. But selection bias means it will almost certainly be more, and generally much more. In fields that study phenomena that don't exist, 100% of the papers published will be of false results that were significant by chance. In many real fields, rates of failure to replicate are around 30%. Describing 30% as "5%, more or less" seems odd, to say the least.
But the proposal to reduce the p value doesn't solve the underlying problem (which was Lumifer's response). If we set the p value threshold lower, at .01 or .001 or wherever, we reducing the risk of false positives at the cost of increasing the risk of false negatives. A study design which needs to determine an effect at the .001 level is much more expensive than a study design which needs to determine an effect at the .05 level, and so we will have many less studies attempted, and many many less published studies.
Better to drop p entirely. Notice that stricter p thresholds go in the opposite direction as the publication of negative results, which is the real solution to the problem of selection bias. By calling for stricter p thresholds, you implicitly assume that p is a worthwhile metric, when what we really want is publication of negative results and more replications.
My grandparent post was stupid, but what I had in mind was basically a stage-2 (or -3) drug trial situation. You have declared (at least to the FDA) that you're running a trial, so selection bias does not apply at this stage. You have two groups, one receives the experimental drug, one receives a placebo. Assume a double-blind randomized scenario and assume there is a measurable metric of improvement at the end of the trial.
After the trial you have two groups with two empirical distributions of the metric of choice. The question is how confident you are that these two distributions are different.
Well, as usual it's complicated. Yes, the p-test is suboptimal in most situations where it's used in reality. However it fulfils a need and if you drop the test entirely you need a replacement for the need won't go away.