Severe problems with the biomedical research process
GiveWell has recently been investigating ways to improve biomedical research. When I discovered GiveWell's research I was shocked by how severe and comprehensive the problems with the field seem to be:
From a conversation with Ferric Fang:
Because scientists have to compete for grants, they spend a very large fraction of their time fundraising, sometimes more than 50% of their working hours. Scientists feel [strong] pressure to optimize their activities for getting tenure and grants, rather than for doing good science.
From a conversation with Elizabeth Iorns:
Researchers are rewarded primarily for publishing papers in prestigious journals such as Nature, Science and Cell. These journals select for papers that report on surprising and unusual findings. Papers that report on unsound research that is apparently exciting are more likely to be published than papers which report on less exciting research that is sound.
There is little post-publication check on the soundness of papers’ findings, because journals, especially prestigious ones, generally don’t publish replications, and there is little funding for performing replications.
[…]
Pharmaceutical companies such as Bayer and Amgen have studied the frequency with which studies are reproducible by trying to reproduce them, and they have found that about 70% of published papers in the areas that they considered don’t reproduce.
[…]
Because many published results are not reproducible, it is difficult for scientists to use the published literature as a basis for deciding what experiments to perform.
[…]
As things stand, the pharmaceutical industry does replications, however, these are generally unpublished. Because a given lab doesn’t know whether other labs have found that a study fails to replicate, labs duplicate a lot of effort.
From a conversation with Ken Witwer:
Dr. Witwer published a study in Clinical Chemistry examining 127 papers that had been published in between July 2011 and April 2012 in journals that ostensibly require that researchers deposit their microarray data. He found that the data was not submitted for almost 60% of papers, and that data for 75% of papers were not in a format suitable for replication.
The above remarks give the impression that the problems are deeply entrenched and mutually reinforcing. On first glance, it seems that while one might be able to make incremental improvements (such as funding a journal that publishes replications), prospects for big improvements are very poor. But I became more hopeful after learning more.
The Rising Sea
The great mathematician Alexander Grothendieck wrote about two approaches to solving a difficult problem:
If you think of a theorem to be proved as a nut to be opened, so as to reach “the nourishing flesh protected by the shell”, then the hammer and chisel principle is: “put the cutting edge of the chisel against the shell and strike hard. If needed, begin again at many different points until the shell cracks—and you are satisfied”.
[…]
I can illustrate the second approach with the same image of a nut to be opened. The first analogy that came to my mind is of immersing the nut in some softening liquid, and why not simply water? From time to time you rub so the liquid penetrates better, and otherwise you let time pass. The shell becomes more flexible through weeks and months—when the time is ripe, hand pressure is enough, the shell opens like a perfectly ripened avocado!
A different image came to me a few weeks ago. The unknown thing to be known appeared to me as some stretch of earth or hard marl, resisting penetration … the sea advances insensibly in silence, nothing seems to happen, nothing moves, the water is so far off you hardly hear it …. yet it finally surrounds the resistant substance.
When a nut seems too hard to crack, it’s wise to think about the second method that Grothendieck describes.
Alternative Metrics
I was encouraged by GiveWell’s subsequent conversations, with David Jay and Jason Priem, which suggest a “rising sea” type solution to the cluster of apparently severe problems with biomedical research.
In brief, the idea is that it may be possible to create online communities and interfaces that can be used to generate measures of how valuable researchers find research outputs, and which could be used for funding and tenure decisions, thereby rewarding producing the research outputs that other researchers find most valuable. If incentives become aligned with producing valuable research, the whole system will shift accordingly, greatly reducing the existing inefficiencies.
From a conversation with Jason Priem
Historically, the academic community has filtered academic outputs for interest by peer review and, more specifically, the prestige of the journals where papers are published. This model is inadequate relative to filtering mechanisms that are now in principle possible using the Internet.
It is now possible to use the web to measure the quality and impact of an academic output via alternative metrics (altmetrics) such as
- How many people downloaded it
- How much it has been discussed on Twitter
- How many websites link to it
- The caliber of the scientists who have recommended it
- How many people have saved it in a reference manager like Mendeley or Zotero
This is similar to how Google generates a list of webpages corresponding to a search term, since you can benefit from PageRank-type algorithms that foreground popular content in an intelligent fashion.
[…]
There’s been a significant amount of interest from funders and administrators in more nuanced and broader measures of researcher impact than their journal publication record. […] Algorithmically generated rankings of researchers’ influence as measured by the altmetrics mentioned previously could be an input into hiring, tenure, promotion, and grant decisions. ImpactStory and other providers of alternative metrics could help researchers’ aggregate their online impact so that they can present good summaries of it to administrators and funders.
From a conversation with David Jay
Commenting systems could potentially be used to create much more useful altmetrics. Such altmetrics could be generated for a scientific output by examining the nature of the comments that scientists make about it, weighting the comments using factors such as the number of upvotes that a comment receives and how distinguished the commenter is.
The metrics generated would be more informative than a journal publication record, because commenters give more specific feedback than the acceptance/rejection of a paper submitted to a given journal does.
[…]
If scientists were to routinely use online commenting systems to discuss scientific outputs, it seems likely that altmetrics generated from them would be strong enough for them to be used for hiring, promotion and grant-making decisions (in conjunction with, or in place of, the traditional metric of journal publication record).
[…]
David Jay envisages a future in which there is [...] A website which collects analytics from other websites so as to aggregate the impact of individual researchers, both for their own information and for use by hiring/promotion/grant committees.
The viability of this approach remains to be seen, but it could work really well, and illustrate a general principle.
About the author: I worked as a research analyst at GiveWell from April 2012 to May 2013. All views expressed here are my own.
Agreed that improved incentives for truth-seeking would improve details across the board, while local procedural patches would tend to be circumvented.
The first three metrics seem like they could even more strongly encourage sexy bogus findings by giving the general public more of a role: the science press seem to respond strongly to press releases and unsubstantiated findings, as do website hits (I say this based on the "most emailed" and "most read" categories at the NYTimes science section).
A fundamental problem seems to be that there is a lower prior for any given hypothesis, driven by the increased number of researchers, use of automation, and incentive to go hypothesis-fishing.
Wouldn't a more direct solution be to simply increase the significance threshold required in the field?