Metabarcoding sequencing: moving from MiSeq to NovaSeq

25 November 2020

DNA metabarcoding, i.e. targeted sequencing of taxonomically informative genetic markers, is an efficient method for the assessment and monitoring of biological diversity. Microbiome research is rapidly advancing boosted by sequencing of 16S rRNA gene and ITS region, while the noninvasively surveying plant and animal species richness from different ecosystems is nowadays routinely achieved through the sequencing of plastid or mitochondrial loci such as rbcL, matK and COI or cox1. 

Currently, Illumina MiSeq is the predominantly used metabarcoding platform in the most sequencing labs, including ours. MiSeq is great in doing its job, however as the number of samples increases (batches of several hundreds or thousands) it becomes quite slow and expensive. This difficulty can be solved by implementing Illumina’s newest high-capacity platform, the NovaSeq 6000, which has a sequencing depth approximately 700 times greater than that of the MiSeq. 

Some concerns on transferring to another instrument might lay in the fact that MiSeq produces longer reads than NovaSeq, 300bp vs. 250bp, thus allowing for more accurate taxonomic classification. Recently we came across a study that examined the ability of MiSeq and NovaSeq to detect biological diversity present in a seawater via two amplicons within the standard barcode region of COI. It is a must-read article for all of those that might have some doubts on moving from MiSeq to NovaSeqWe assure you that it will convert any skeptic!  

Here are the major observations:  

  • NovaSeq detected many more taxa, i.e. more exact sequence variants (ESVs), than the MiSeq thanks to its much greater sequencing depth (averaging per sample 7 million vs. 100.000 reads per amplicon produced by NovaSeq vs. MiSeq, respectively). 

  • (Unexpectedly!) NovaSeq detected more DNA sequence diversity within samples than the MiSeq even in the depth-for-depth comparisons (at the exact same sequencing depth).  

  • At the same sequencing depth there was substantial overlap between the ESVs found between the two platforms, however, the MiSeq had very few ESVs unique to itself while the NovaSeq found many ESVs that the MiSeq missed. 

  • Adding greater sequencing depth to the MiSeq only marginally improves its detection of diversity, hitting plateau at 1 million reads per amplicon. On the contrary, the NovaSeq continued to detect new sequence variants up to 10 million reads. Thus, an extremely deep sequencing is required if one wants to have a comprehensive survey of biodiversity. 

  • NovaSeq was generally able to detect more families within each order than the MiSeq; when using only one marker the NovaSeq detected 200% more metazoan families than the MiSeq. 

  • Low abundance eDNA are less likely to be detected by the MiSeq than the NovaSeq. 

Keep in mind that the exact same PCR products were used for both instruments, so results cannot be the consequence of stochastic PCR biases.

In general, when using MiSeq there may be a great deal of missing biodiversity. Whether or not this has a significant impact on a study will of course depend on the nature of that study. In any case, it seems that the superior performance depends on the NovaSeq’s patterned flow-cell, which prevents similar sequences that are neighbors on the flow cell (common in metabarcoding studies) from being erroneously merged into single spots by the sequencing instrument. As discussed in the article "in amplicon-based sequencing the variability from one spot to the next— especially within the first 25 bases which likely covers primer regions—is minimal and this can cause two distinct spots to be merged together. To prevent this from happening, Illumina recommends spiking in PhiX genome, but unless the proportion of PhiX is very high it is almost impossible to prevent similar sequences from sitting near each other on the fow cell. Conversely, the spots that DNA anneal to on the NovaSeq flow cell are pre-defined and known by the instrument’s base calling software, so inferring their location is not necessary and this largely prevents the “over-clustering” of low diversity reads”. 

Hope you found this post useful. 

Stay tuned for the results of the undergoing experiment from our lab in which we are testing the read-length effect (300bp vs. 250bp) in the 16S rRNA metabarcoding experiment.

 

P.S. You can find information on our DNA metabarcoding services at https://igatechnology.com/genomics-research-services/plantanimal/metabarcoding/