Exome-Seq: dealing with degraded samples

06 August 2019

It is known that Next Gen sequencing requires the gDNA input of high molecular weight to construct quality libraries.

So, is there a chance for degraded samples? Can one do some adjustments to obtain meaningful data from inputs of low integrity? 

To answer these questions, we evaluated performance of SureSelectXT Human All Exon v6 protocol (Agilent Technologies, Santa Clara, CA) using FFPE samples of different quality.

Three FFPE DNA samples with different DNA Integrity Numbers (DIN 2, 4 and 8) were chosen for the analysis. DIN value ranges from 1 to 10, where 1 indicates highly degraded and 10 represents highly intact gDNA.

Prior library construction, gDNAs were fragmented by applying 2, 4 and 7 30/90 sonication cycles. Profiles of input as well as of sonicated DNA are illustrated in Fig 1. The more the peak is shifted to the left, the more fragmented is the DNA. You can clearly notice how increasing the sonication cycles increases the fragmentation levels and how the same shearing condition prominently shatters inputs of lower quality.


Fig 1. Effect of several sonication conditions on the fragmentation levels of DNA of different quality.  Two, four and seven cycles of 30/90 seconds on Bioruptor (Diagenode, Belgium) were applied to shear all samples. Profiles of both input and sonicated DNA were obtained by using Bioanalyzer DNA HS chip (Agilent Technologies, Santa Clara, CA).

Following the library construction, sequencing was performed on HiSeq2500, 125 bp PE mode. On average, 31.8 M reads were generated per sample (min 31 M, max 34 M). BWA-MEM was used to align all the raw data to the hg19 reference. SAMtools and Picard were used for sorting SAM/BAM files and duplicate markings, respectively. The read alignment rate was 99.84%, on average. Picard was used to obtain high level metrics about the alignment of reads within a BAM file and the set of metrics specific to a hybrid selection analysis reported in Table 1.

Numbers are explicit, low quality inputs suffer fragmentation that obviously reduces insert size and increases duplicate levels, directly affecting overall target coverage.  

Table 1. Alignment and the hybrid selection metrics.


On this blog we’ve already discussed on how to use input quantity as a marker for the reads duplication levels indicating 100 ng of input DNA, or 500 ng of library used for capture as a minimum quantity to be processed to achieve adequate results. However, these assumptions were based on the analysis of a good quality sample and don’t seem to work for the highly degraded samples.

Under the same fragmentation conditions and the same quantity of the input used for hybridization, lower DIN samples showed higher level of duplicates (Fig 2).


Fig 2. Duplication levels for samples with different integrity rank at fixed fragmentation conditions and quantity of input used in capture reaction.

QC of the input DNA is fundamental for obtaining meaningful data. In addition to purity, it is crucial to have integrity information as well. During the library construction smaller fragments will be lost throughout the purification resulting in impoverished final library that undergoes target capture steps (we performed post adapter-ligation purification by AMPure XP beads 0.8X). This in turn will impact library complexity and coverage efficiency.  Thus, we strongly suggest estimating DIN and modulate the both fragmentation and consecutive post-ligation beads purification accordingly.

Hope you’ve found this post useful.


Cheers and stay tuned for more updates!