#LabNote: the impact of sequencing coverage in 16S metagenomics

04 June 2018

You have surely noticed by now that one of the most universal questions in NGS is:

How low can you go?

It refers to everything, from input quantity/quality to depth of sequencing coverage and, of course, prices.

Amplicon analysis of 16S rRNA gene is currently the most common sequencing approach to explore the microbiome. That is why we set out to evaluate the impact of sequencing coverage in 16S metagenomics experiment, trying to establish the minimum number of reads/fragments that allows to obtain reliable results.

We investigated the complexity of microbial community at six different sequencing depths:

200.000 reads = 100.000 fragments

100.000 reads = 50.000 fragments

50.000 reads = 25.000 fragments

20.000 reads = 10.000 fragments

10.000 reads = 5.000 fragments

1.000 reads = 500 fragments

for six different samples: two soils, two human feces, one equine and one canine feces sample.

Upon library preparation sequencing was performed at 300 bp paired-end mode on Illumina MiSeq platform producing more than 200x103 reads per sample. Seqtk and QIIME were used for reads subsampling and phylogenetic reconstruction and diversity analyses, respectively.

 

Figure 1. Principal coordinate analysis (PCoA) plot showing the clustering based on 16S rRNA gene sequencing of samples at different sequencing coverage depths.

 

As shown in Figure 1, the species composition was identical within the sample at different sequencing depths, and as little as 1.000 reads per sample were enough to distinguish unique microbiome fingerprint of each sample. In addition, reduced sequencing coverage did not impact on diversity as measured by Shannon diversity, which was the same at 200.000 and at 1.000 reads (Figure 2).

These results are a consequence of the fact that even in complex matrixes there are a few species that are dominant and that represent the majority of sample biodiversity. Thus, even reduced coverage can satisfactorily characterize species composition required for the general characterization of the sample.

Figure 2. Shannon diversity index curves showing the diversity of taxa present in different samples as a function of sequencing coverage depths.

 

If you are still guessing on how low can you go, you can probably go even lower! At this point it gets quite challenging to get the loading concentrations right in order to obtain as few sequences.

 

Stay tuned, one of our next topics will be the comparison of 16S approach vs. whole genome sequencing for microbiome analysis.