16S amplicons loaded on a single Illumina lane (regardless of platform type or sequencing length) use to suffer poor quality due to low base complexity. Each cluster on a flowcell starts with the same base (the primers used for amplification) and mostly continue in a low complexity region depending on the diversity of the microbial community (which is anyhow far from balanced base composition on each cycle of sequencing; Fig. 1A). This causes impairing in the cluster identification during first cycles of sequencing since all clusters will emit the same color and thus reduce the ability to discriminate those that juxtapose one another.
In addition, a completely biased base representation on the leading sequencing cycles causes overall drop in base call quality due to poor calibration of fluorescence readout (Fig. 1B). Usually, quality drop becomes dramatic on the last 75-100 bp of the read 2 (R2) on MiSeq 300bp - paired-end sequencing.
Figure 1.
A
B
Fig. 1: Standard 16S amplification protocol. Base representation of first 100bp from R1 data (A). Quality profile of amplicon data sequenced on MiSeq platform with standard loading (manufacturer suggested with <1000K/mm2), R1 left and R2 right (B).
By utilizing custom primer specifically designed to improve the base representation (without any modification to the actual annealing site to maintain reproducibility with standard methods; Fig. 2A) we manage to increase the overall quality of sequencing output (Fig. 2B). This corresponds to a more powerful dataset in terms of taxa identification and overall usable reads for downstream analyses.
Figure 2.
A
B
Fig. 2: PRO-amp 16S amplification protocol. Base representation of first 100bp from R1 data (A). Quality profile of amplicon data sequenced on MiSeq platform with standard loading (manufacturer suggested with <1000K/mm2), R1 left and R2 right (B).
In a comparative test, standard and PRO-amp protocols were applied to the same set of samples, and sequenced on the same MiSeq run. The PRO-amp reads showed an increased capacity to overlap their 3’ ends (average 98% efficiency) compared to reads obtained from their standard-protocol counterpart (average 95%). In the past, we also noted that runs entirely loaded with standard 16S amplification libraries can suffer dramatic quality drop, resulting in efficiency to overlap 3’-ends down to 75%. 3’-ends overlap is an important attribute of 16S sequencing data as it allows for a reconstruction of a unique, error-corrected, PCR fragment. This can provide an increased power in taxonomic discrimination and precision in the clustering of operation taxonomical units (OTUs, i.e. clusters of fragments belonging to a common taxonomical level).
A diversity analysis carried with both standard and PRO-amp protocols on the same set of samples highlighted that the PRO-amp system does not introduce bias in the diversity estimate (Fig. 3)
Figure 3.
Fig. 5. Bray-Curtis diversity analysis showed that standard (blue) PRO-amp (red) sample pairs align each other.
Finally, to ensure that no specific enrichment was accounted to any level of taxonomic abundance, an enrichment test (MetagenomeSeq package, fitZIG function) was performed for each of the following estimates:
- 10 phylum-level taxa
- 22 class-level taxa
- 38 order-level taxa
- 66 family-level taxa
- 124 genus-level taxa
- 260 specie-level taxa
None of them showed a statistically significant enrichment.
What does this mean?
If you agree to adopt PRO-amp protocol your 16S metagenomics data will be of higher quality. The only difference is that raw reads will be shorter by means of 1 to 10 bases, randomly (no bias for specific samples). With a 16S amplicon of some 464 bp and sequencing on MiSeq with 300bp paired-ends, the overlap of 3’ends will occur anyway (and to a higher rate given the increased base call).
Data will be provided with primer sites already removed from sequences.