critical factors to consider for deep sequencing from clinical samples
In the last few months, we have been involved in several COVID-19 related activities and one of our main goals was to provide an effective way to retrieve viral sequences from swab samples. For this purpose we adopted mainly two methods: amplicon-Seq (based on the ARTIC protocol, for which we will discuss our results in another post) and RNA-Seq.
Amplicon-Seq might be sufficient to retrieve a consensus sequence for a given viral template and can reach very high sensitivity, but it suffers several limitations. The major drawback of this method is amplification drop out, which hampers the possibility for a more detailed and focused research. Given the limiting amounts of available template retrieved (cDNA) from a nasopharyngeal swab specimen, the number of PCR cycles is substantially increased to obtain a working library. Consequently, if the sample contains any de novo mutation or a co-infection by multiple sources (i.e. different viral haplotypes), variants could be lost due to amplification drop-outs resulting in an unreliable estimation of relative abundance
s. This bias is randomly generated, making it unpredictable and hard to detect in each amplicon that is generated.
RNA-Seq on the other hand, has the capability to provide a much broader spectrum of information due to non-targeted sequencing of all the RNA molecules in the sample, thus assessing the microbial colonization of the nasopharynx as well as the occurrence of secondary viral infections.
One of the major obstacles in using nasal swab samples is the extremely low amounts of extracted RNA, which is usually adequate for standard RT-PCR assays, but not for standard and reproducible NGS experiments. Since we have successfully used Tecan Genomics’ solutions for extremely low-input materials (e.g. RIP-Seq, degraded RNA) via their proprietary SPIA cDNA amplification technology, we chose the Trio RNA-Seq library preparation kit for processing our COVID-19 swab samples. By the way, a valuable advantage of the SPIA technology is that it is a highly reproducible method of isothermal amplification of cDNA, which maintains the original relative abundance of molecules.
The amounts of starting material available at the beginning of the process are shown in the table below. In spite of the very low quantities, all libraries were successfully prepared (adequate molarity for quantification and loading estimation) and sequenced on the NovaSeq6000 platform in PE150 mode.
Nine out of eleven samples with CT<23, resulted in >97% SARS-CoV-2 genome with a minimum of 50X coverage and this was achieved by sequencing on an average 5.3M reads/sample. For samples in the 23<CT<27 range, seven out of nine were permissive to retrieve >95% viral genome with a minimum coverage of 5X but this required an increased average sequencing effort, up to 50M reads/sample. Samples with CT>27 usually failed to deliver the full (>95%) viral genome at a minimum coverage of 5X, even when increasing the sequencing effort up to >100M reads/sample. For the samples achieving less than 50% of the viral genome with a min 20X coverage, the average CT value was 28.5, while for samples achieving or exceed this result the average CT value was 22.1.
It is noteworthy that CT values estimation was performed several days before sample shipment to our lab, therefore it cannot be excluded that certain samples may have undergone some RNA degradation.
Taken together, the results obtained were highly satisfactory as we were able to effectively utilize an unbiased RNA-Seq approach on nasopharyngeal swab samples. For some samples, the required sequencing depth to cover the SARS-CoV-2 genome was as low as 5M reads, which translated to a negligible cost. The only limiting threshold for a cost-effective utilization of this technique is the real viral load (here approximated by CT values). Hence, for large cohorts, we recommend RT-PCR screening prior to processing the samples in order to avoid the effort for samples with CT>27.