So the TOGETHER trial signal boosted by Scott of slate star codex found Fluvoxamine to be effective at reducing 30% of COVID hospitalisation and fatality.

The NIH looked at the study and found it unconvicing, I am a bit confused as to the rationale.

I'll list it out as I understand it:

  • the primary outcomes [retention in the emergency department for >6 hours or admission to a tertiary hospital] was chosen without rationale
  • There was no significant difference in mortality between study arms in the intention-to-treat (ITT) population [however, it's 2% in treatment arm and 3% in placebo arm as expected of the 30% reduction expectation]
  • significant difference was only found in patients who have persisted in taking >80% of fluvoxamine doses, however there were also improved outcome for patients who have persisted in taking >80% of placebo dose, suggesting that another mechanism [e.g. conscientiousness] to be resposible for [most? all?] the improvement in outcome. 

Is my understanding correct and does NIH's critiques of the study hold merrit? 

New Answer
New Comment

1 Answers sorted by

Daniel V

30
  1. Choice of primary outcome - it's a good idea to choose a primary outcome measure of relevance. This 6 hour threshold is a little weird (apparently because the study was done in clinical sites not hospitals, so they wanted to get to a sort of hospitalization-equivalent), but they targeted that from the outset, so it's not that troubling, but maybe not of interest. If the NIH is worried about any ED visit and not just long ones (or doesn't have a lot of faith in the equivalence), then this measure doesn't necessarily speak to what the NIH cares about. The NIH barely explains itself there, but TOGETHER may have shot itself in the foot. I'm sure they did the best they could.
  2. NS difference in ITT analysis - I think as-treated (AT) or per-protocol (PP) analyses are generally likeable for studying efficacy as the ITT estimate suffers from contamination, but the NIH's concern here that it kills the randomization is extremely legitimate. A PP analysis may suffer from self-selection and deserves a little more scrutiny when done not to get a better effect size estimate but push over a statistical significance threshold. The ITT analysis also generates a better estimate for effectiveness in the real world since imperfect adherence happens.
  3. 80% threshold for adherence - as you've probably noticed, the NIH is having issues with the credibility of the findings. If you are going to do a PP analysis post-hoc, presumably you have a reasonable adherence cutoff, right? Or did you choose that cutoff just to get through the significance filter? Well in this case, the NIH's concern is a little less fair as the 80% threshold is not atypical for medical research, not that that makes it a great threshold.

Add in a trial to support efficacy on an outcome measure the NIH likes, but it's small and could suffer from non-response bias with a high amount of attrition. And add in a trial that was stopped for futility. You might update that fluvoxamine has some efficacy and be relieved that it is prescribable, but you might stop short of recommending it as standard of care.