Comment author: neq1 22 September 2010 11:31:39PM 9 points [-]

In the first example, you couldn't play unless you had at least 100M dollars of assets. Why would someone with that much money risk 100M to win a measly 100K, when the expected payoff is so bad?

Comment author: Morendil 22 September 2010 08:02:21AM 11 points [-]

I would not be surprised if at least 20% of published studies include results that were affected by at least one coding error.

My intuition is that this underestimates the occurrence, depending on the field. Let us define:

  • CE = study has been affected by at least one coding error
  • SP = study relies on a significant (>500 LOC) amount of custom programming

Then I'd assign over 80% to P(CE|SP).

My mom is a semi-retired neuroscientist, she's been telling me recently how appalled she's been with how many researchers around her are abusing standard stats packages in egregious ways. The trouble is that scientists have access to powerful software packages for data analysis but they often lack understanding of the concepts deployed in the packages, and consequently make absurd mistakes.

"Shooting yourself in the foot" is the occupational disease of programmers, and this applies even to non-career programmers, people who program as a secondary requirement of their job and may not even have any awareness that what they're doing is programming.

Comment author: neq1 22 September 2010 10:56:01AM 3 points [-]

In cases where a scientist is using a software package that they are uncomfortable with, I think output basically serves as the only error checking. First, they copy some sample code and try to adapt it to their data (while not really understanding what the program does). Then, they run the software. If the results are about what they expected, they think "well, we most have done it right." If the results are different than they expected, they might try a few more times and eventually get someone involved who knows what they are doing.

Comment author: CronoDAS 22 September 2010 03:27:21AM 29 points [-]

Feynman once talked about this specific issue during a larger speech:

We have learned a lot from experience about how to handle some of the ways we fool ourselves. One example: Millikan measured the charge on an electron by an experiment with falling oil drops, and got an answer which we now know not to be quite right. It's a little bit off, because he had the incorrect value for the viscosity of air. It's interesting to look at the history of measurements of the charge of the electron, after Millikan. If you plot them as a function of time, you find that one is a little bigger than Millikan's, and the next one's a little bit bigger than that, and the next one's a little bit bigger than that, until finally they settle down to a number which is higher.

Why didn't they discover that the new number was higher right away? It's a thing that scientists are ashamed of--this history--because it's apparent that people did things like this: When they got a number that was too high above Millikan's, they thought something must be wrong--and they would look for and find a reason why something might be wrong. When they got a number closer to Millikan's value they didn't look so hard. And so they eliminated the numbers that were too far off, and did other things like that. We've learned those tricks nowadays, and now we don't have that kind of a disease.

Comment author: neq1 22 September 2010 03:33:37AM 1 point [-]

Good find. Thanks.

Error detection bias in research

54 neq1 22 September 2010 03:00AM

I have had the following situation happen several times during my research career:  I write code to analyze data; there is some expectation about what the results will be; after running the program, the results are not what was expected; I go back and carefully check the code to make sure there are no errors; sometimes I find an error

No matter how careful you are when it comes to writing computer code, I think you are more likely to find a mistake if you think there is one.  Unexpected results lead one to suspect a coding error more than expected results do.

In general, researchers usually do have general expectations about what they will find (e.g., the drug will not increase risk of the disease; the toxin will not decrease risk of cancer).

Consider the following graphic:

Here, the green region is consistent with what our expectations are.  For example, if we expect a relative risk (RR) of about 1.5, we might not be too surprised if the estimated RR is between (e.g.) 0.9 and 2.0.  Anything above 2.0 or below 0.9 might make us highly suspicious of an error -- that's the red region.  Estimates in the red region are likely to trigger serious coding error investigation.  Obviously, if there is no coding error then the paper will get submitted with the surprising results.

continue reading »
Comment author: neq1 19 September 2010 03:41:38PM 5 points [-]

Error finding: I strongly suspect that people are better at finding errors if they know there is an error.

For example, suppose we did an experiment where we randomized computer programmers into two groups. Both groups are given computer code and asked to try and find a mistake. The first group is told that there is definitely one coding error. The second group is told that there might be an error, but there also might not be one. My guess is that, even if you give both groups the same amount of time to look, group 1 would have a higher error identification success rate.

Does anyone here know of a reference to a study that has looked at that issue? Is there a name for it?

Thanks

Comment author: PhilGoetz 16 September 2010 08:15:04PM *  0 points [-]

Wait a minute - when the Bayesian says "I think the coin probably has a chance near 50% of being heads", she's using data from prior observations of coin flips to say that. Which means that the frequentist might get the same answer if he added those prior observations to his dataset.

Comment author: neq1 17 September 2010 10:07:04AM 0 points [-]

Yes, that's a good point. Tthat would be considered using a data augmentation prior (Sander Greenland has advocated such an approach).

Comment author: Oscar_Cunningham 16 September 2010 11:28:11AM *  1 point [-]

I hadn't seen that, but you're right that that sentence is wrong. "Probability" should have been replaced with "frequency" or something. A prior on a probability would be a set of probabilities of probabilities, and would soon lead to infinite regress.

Comment author: neq1 16 September 2010 01:33:14PM 1 point [-]

only if you keep specifying hyper-priors, which there is no reason to do

Comment author: cousin_it 16 September 2010 09:44:20AM *  1 point [-]

I don't understand how you can hold a position like that and still enjoy the post. How do you parse the phrase "my prior for the probability of heads" in the second example?

Comment author: neq1 16 September 2010 01:31:36PM 2 points [-]

In the second example the person was speaking informally, but there is nothing wrong with specifying a probability distribution for an unknown parameter (and that parameter could be a probability for heads)

Comment author: MC_Escherichia 16 September 2010 10:55:34AM 3 points [-]

If the null hypothesis was true, the probability that we would get 3 heads or less is 0.08

Is the idea that the coin will land heads 90% of the time really something that can be called the "null hypothesis"?

Comment author: neq1 16 September 2010 12:07:51PM 0 points [-]

Hm, good point. Since the usual thing is .5, the claim should be the alternative. I was thinking in terms of trying to reject their claim (which it wouldn't take much data to do), but I do think my setup was non-standard. I'll fix it later today

Bayes' rule =/= Bayesian inference

37 neq1 16 September 2010 06:34AM

Related to: Bayes' Theorem Illustrated, What is Bayesianism?, An Intuitive Explanation of Bayes' Theorem

(Bayes' theorem is something Bayesians need to use more often than Frequentists do, but Bayes' theorem itself isn't Bayesian. This post is meant to be a light introduction to the difference between Bayes' theorem and Bayesian data analysis.)

Bayes' Theorem

Bayes' theorem is just a way to get (e.g.) p(B|A) from p(A|B) and p(B).  The classic example of Bayes' theorem is diagnostic testing.  Suppose someone either has the disease (D+) or does not have the disease (D-) and either tests positive (T+) or tests negative (T-).  If we knew the sensitivity P(T+|D+), specificity P(T-|D-) and disease prevalence P(D+), then we could get the positive predictive value P(D+|T+) using Bayes' theorem:

For example, suppose we know the sensitivity=0.9, specificity=0.8 and disease prevalence is 0.01.  Then,


This answer is not Bayesian or frequentist; it's just correct.

Diagnostic testing study

Typically we will not know P(T+|D+) or P(T-|D-).  We would consider these unknown parameters.  Let's denote them by Θsens and Θspec.  For simplicity, let's assume we know the disease prevalence P(D+) (we often have a lot of data on this). 

Suppose 1000 subjects with the disease were tested, and 900 of them tested positive.  Suppose 1000 disease-free subjects were tested and 200 of them tested positive.  Finally, suppose 1% of the population has the disease.

Frequentist approach

Estimate the 2 parameters (sensitivity and specificity) using their sample values (sample proportions) and plug them in to Bayes' formula above.  This results in a point estimate for P(D+|T+) of 0.043.  A standard error or confidence interval could be obtained using the delta method or bootstrapping.

Even though Bayes' theorem was used, this is not a Bayesian approach.

Bayesian approach

The Bayesian approach is to specify prior distributions for all unknowns.  For example, we might specify independent uniform(0,1) priors for Θsens and  Θspec.  However, we should expect the test to do at least as good as guessing (guessing would mean randomly selecting 1% of people and calling them T+). In addition, we expect Θsens>1-Θspec. So, I might go with a Beta(4,2.5) distribution for Θsens and Beta(2.5,4) for Θspec:

Using these priors + the data yields a posterior distribution for P(D+|T+) with posterior median 0.043 and 95% credible interval (0.038, 0.049).  In this case, the Bayesian and frequentist approaches have the same results (not surprising since the priors are relatively flat and there are a lot of data).  However, the methodology is quite different.

Example that illustrates benefit of Bayesian data analysis

(example edited to focus on credible/confidence intervals)

Suppose someone shows you what looks like a fair coin (you confirm head on one side tails on the other) and makes the claim:  "This coin will land with heads up 90% of the time"

Suppose the coin is flipped 5 times and lands with heads up 4 times.

Frequentist approach

"A 95% confidence interval for the Binomial parameter is (.38, .99) using the Agresti-Coull method."  Because 0.9 is within the confidence limits, the usual conclusion would be that we do not have enough evidence to rule it out.

Bayesian approach

"I don't believe you.  Based on experience and what I know about the laws of physics, I think it's very unlikely that your claim is accurate. I feel very confident that the probability is close to 0.5.  However, I don't want to rule out something a little bit unusual (like a probability of 0.4).  Thus, my prior for the probability of heads is a Beta(30,30) distribution."

After seeing the data, we update our belief about the binomial parameter.  The 95% credible interval for it is (0.40, 0.64).  Thus, a value of 0.9 is still considered extremely unlikely.

This illustrates the idea that, from a Bayesian perspective, implausible claims require more evidence than plausible claims.  Frequentists have no formal way of including that type of prior information.

View more: Prev | Next