χ-scan: a method to detect mosaic structural variation by examining deviations in SNP relative allele frequencies

Our latest work addresses the need for a method specifically developed for mosaic structural variants (SV) detection from Next Gen sequencing (NGS) data.

As an important type of genetic variation SV, including deletions, duplications, and copy number variants as well as insertions, inversions, and translocations of large (>1 kb) segments of DNA, contributes to phenotypic differences. When occurring in somatic cells, originating from the same fertilized egg and affecting only a portion of the cells of an organism, they give rise to mosaic structural variation.

Mosaic structural variation is involved in human diseases such as cancer, developmental  and autism spectrum disorders, as well as in clonal variation in plants. While existing methods for the detection of somatic point mutations from high throughput sequencing data show high sensitivity and specificity, identification of structural variants is still challenging and largely unexplored.

In collaboration with the Institute of Applied Genomics and the Department of Mathematics, Informatics and Physics of the University of Udine we developed χ-scan, a software tool specifically dedicated to the detection of several kinds of mosaic variants including presence-absence variants, copy number variants and chromosomal replacements. We started from the rational that mosaic SV should result in a reduction of heterozygosity (ROH) that can be detected using heterozygous single SNPs in NGS data. In heterozygous genomic positions, the two alleles are expected to be equifrequent (p = q = 0.5), and heterozygosity is 2*p*q = 0.5, which is the maximum theoretical value. When a mosaic SV occurs, the frequency of the two alleles varies, thus leading to a ROH. By comparing SNPs in populations of cells present in two samples derived from the same zygote, such as tumor and normal tissue from a patient, two populations of neuronal cells or two different plant clones obtained by vegetative propagation, distortion of SNP allele frequency in one sample compared to the other points to the presence of mosaic SVs.

Deviations in allele frequency caused by several kinds of mosaic SV. Black and yellow dots represent the reference and alternative allele in heterozygous SNPs, respectively. The variant allele frequency is calculated as the proportion of reads carrying the allele. The graphs in the bottom line are the default graphs produced by χ-scan. The green line represents variant allele frequency in the wild-type clone, and the red line represents allele frequency of the clone carrying the SV listed in the header (wild type/no SV, deletion, duplication and chromosome replacement, respectively). Coloured background indicates regions in which the deviation in allele frequency is statistically significant.

We tested the ability of χ-scan to identify simulated deletions occurring as mosaics in varying proportion of cells by computing F1 scores at decreasing relative abundance of reads carrying the variant allele and using different detection thresholds. The performance of χ-scan was compared with that of other tools for the detection of SV, such as DNAcopy, BDmax, Control-FREEC and DELLY. Χ-scan showed significantly higher median F1 score than all other tools in almost all the simulated scenarios.

For more details read the full article here.

χ-scan is freely available at https://bitbucket.org/dscaglione/xscan/overview.