At the Nucleic Acid Observatory (NAO) we're evaluating pathogen-agnostic surveillance. A key question is whether metagenomic sequencing of wastewater can be a cost-effective method to detect and mitigate future pandemics. In this report we investigate one piece of this question: at a given stage of a viral pandemic, what fraction of wastewater metagenomic sequencing reads would that virus represent?
To make this concrete, we define RA(1%). If 1% of people are infected with some virus (prevalence) or have become infected with it during a given week (incidence), RA(1%) is the fraction of sequencing reads (relative abundance) generated by a given method that would match that virus. To estimate RA(1%) we collected public health data on sixteen human-infecting viruses, re-analyzed sequencing data from four municipal wastewater metagenomic studies, and linked them with a hierarchical Bayesian model.
Three of the viruses were not present in the sequencing data, and we could only generate an upper bound on RA(1%). Four viruses had a handful of reads, for which we were able to generate rough estimates. For the remaining nine viruses we were able to narrow down RA(1%) for a specific virus-method combination to approximately an order of magnitude. We found RA(1%) for these nine viruses varied dramatically, over approximately six orders of magnitude. It also varied by study, with some viruses seeing an RA(1%) three orders of magnitude higher in one study than another.
The NAO plans to use the estimates from this study as inputs into a modeling framework to assess the cost effectiveness of wastewater MGS detection under different pandemic scenarios, and we include an outline of such a framework with some rough estimates of the costs of different monitoring approaches.
Read the full report: Predicting Virus Relative Abundance in Wastewater.
(Posting my LessWrong comment here)
>If you're paying $8k per billion reads
>This will likely go down: Illumina has recently released the more cost effective NovaSeq X, and as Illumina's patents expire there are various cheaper competitors.
Indeed it did go down. Recently I paid $13,000 for 10 billion reads (NovaSeq X, Broad Institute; this was for my meiosis project). So sequencing costs can be much lower than $8K/billion.
Illumina is planning to start offering a 25 billion read flowcell for the NovaSeq X in October; I don't know how much this will cost but I'd guess around $20,000.
ALSO: if you're trying to detect truly novel viruses, using a Kraken database made from existing viral sequences is not going to work! However, many important threats are variants of existing viruses, so those could be detected (although possibly with lower efficiency).
> Recently I paid $13,000 for 10 billion reads
Thanks for sharing the pricing you've been getting!
One thing that makes this a bit confusing is that in the context of paired end reads, "read" can either mean a single forward or reverse read, or it can mean a read pair (a gapped read). As we note under Table 3, we're using the latter: each read is (roughly) an independent observation from the sample. Illumina uses the former, though, and maybe you are as well? In which case instead of $1.3k per billion reads (6x cheaper) you paid $2.6k (3x cheaper)?
> if you're trying to detect truly novel viruses, using a Kraken database made from existing viral sequences is not going to work!
Definitely! The work in this report wasn't to identify novel viruses, it was to understand how much sequencing we might need to do to get some number of reads of a novel virus. As a step toward that longer term goal.
This was per 10 billion pairs (so 6x cheaper). The specifications are here: https://www.illumina.com/systems/sequencing-platforms/novaseq-x-plus/specifications.html
Thanks!
Thanks for sharing!
I am glad cost-effectiveness modelling is seemingly at the core of NAO's research.