neq1 comments on Error detection bias in research - Less Wrong

54 Post author: neq1 22 September 2010 03:00AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (36)

You are viewing a single comment's thread. Show more comments above.

Comment author: Morendil 22 September 2010 08:02:21AM 11 points [-]

I would not be surprised if at least 20% of published studies include results that were affected by at least one coding error.

My intuition is that this underestimates the occurrence, depending on the field. Let us define:

  • CE = study has been affected by at least one coding error
  • SP = study relies on a significant (>500 LOC) amount of custom programming

Then I'd assign over 80% to P(CE|SP).

My mom is a semi-retired neuroscientist, she's been telling me recently how appalled she's been with how many researchers around her are abusing standard stats packages in egregious ways. The trouble is that scientists have access to powerful software packages for data analysis but they often lack understanding of the concepts deployed in the packages, and consequently make absurd mistakes.

"Shooting yourself in the foot" is the occupational disease of programmers, and this applies even to non-career programmers, people who program as a secondary requirement of their job and may not even have any awareness that what they're doing is programming.

Comment author: neq1 22 September 2010 10:56:01AM 3 points [-]

In cases where a scientist is using a software package that they are uncomfortable with, I think output basically serves as the only error checking. First, they copy some sample code and try to adapt it to their data (while not really understanding what the program does). Then, they run the software. If the results are about what they expected, they think "well, we most have done it right." If the results are different than they expected, they might try a few more times and eventually get someone involved who knows what they are doing.