In April we released released
a tool to model
the efficacy of different approaches to stealth pathogen
identification. The tool's interface is pretty rough, which I'm not
super happy about, but there just aren't that many people in the world
who need to simulate the performance impact these design choices.
In response to this work we plan to update our metagenomic
biosurveillance simulator in two ways:
We'll switch the simulator's RAi(1%) from using the mean of the
distribution to sampling from the full posterior distribution. Because
our posteriors sometimes span several orders of magnitude, this change
should better capture our uncertainty.
We'll replace our preliminary influenza A RAi(1%) point estimate of
3.2e-8 with an option to choose each of the four above distributions,
with medians of 1.4e-8, 1.4e-8, 2.8e-9, and 7.0e-10.
Overall we expect these changes to make our projections higher
variance and somewhat less optimistic, but not to have a large impact
on whether this approach to novel pathogen detection is practical.
With Dan's help I've now made
both of these changes (#6,
#7,
#9),
and additionally:
Stopped assuming uniform coverage along the genome, and instead
use the distribution we've observed along SARS-CoV-2. (#5)
While it would be better to generate estimates from a wider range of
pathogens, everything else we've looked at in our samples has too much
genetic diversity for it to be easy to make these estimates. Now that
we've done some spike-ins,
however, I think it ought to be possible to use the highest
concentration spike-in sample to get second estimate, though I haven't
tried this. The overall effect of this change on our estimates should
be to increase variance while slightly lowering median performance.
Updated the pricing we see for the NovaSeq X 25B and switched
pricing to per-lane, based on the pricing we're seeing from sequencing
providers. (#10)
It's great that sequencing is continuing to get cheaper!
Let's compare what the two simulators say for one weekly NovaSeq X 25B run,
generating approximately 2e10 read pairs (SARS-CoV-2,
Flu A)
Note that lower is better here: the charts show the fraction of people
in the monitored sewershed who have ever been infected by the time we
raise the alarm.
Scenario
Cumulative Incidence at Detection
25th
50th
75th
90th
Old, SARS-CoV-2
0.24%
0.48%
0.84%
1.40%
New, SARS-CoV-2
0.53%
1.20%
2.90%
6.50%
Change in Sensitivity, SARS-CoV-2
-55%
-60%
-71%
-78%
Old, Flu A
0.46%
0.84%
1.60%
2.70%
New, Flu A
1.00%
2.50%
5.70%
14.00%
Change in Sensitivity, Flu A
-54%
-66%
-72%
-81%
This makes sense overall: the changes were expected to both make the
simulator less optimistic and increase the variance of its
predictions, and that's what we do see.
Cross-posted from my NAO Notebook.
In April we released released a tool to model the efficacy of different approaches to stealth pathogen identification. The tool's interface is pretty rough, which I'm not super happy about, but there just aren't that many people in the world who need to simulate the performance impact these design choices.
A month ago we published estimates of RAi(1%) for influenza in municipal wastewater, and ended that post with:
With Dan's help I've now made both of these changes (#6, #7, #9), and additionally:
Stopped assuming uniform coverage along the genome, and instead use the distribution we've observed along SARS-CoV-2. (#5) While it would be better to generate estimates from a wider range of pathogens, everything else we've looked at in our samples has too much genetic diversity for it to be easy to make these estimates. Now that we've done some spike-ins, however, I think it ought to be possible to use the highest concentration spike-in sample to get second estimate, though I haven't tried this. The overall effect of this change on our estimates should be to increase variance while slightly lowering median performance.
Updated the pricing we see for the NovaSeq X 25B and switched pricing to per-lane, based on the pricing we're seeing from sequencing providers. (#10) It's great that sequencing is continuing to get cheaper!
Let's compare what the two simulators say for one weekly NovaSeq X 25B run, generating approximately 2e10 read pairs (SARS-CoV-2, Flu A)
Note that lower is better here: the charts show the fraction of people in the monitored sewershed who have ever been infected by the time we raise the alarm.
This makes sense overall: the changes were expected to both make the simulator less optimistic and increase the variance of its predictions, and that's what we do see.
Comment via: facebook, lesswrong, mastodon