De novo sequencing
With the establishment and continuous improvement of NGS technologies, the reconstruction of whole genomes has become an affordable task over the time. Depending on the complexity of genomes, different strategies and technologies can be leveraged to pursue a task that used to require hundreds of thousands of dollars budget; now this task become feasible in the order of tens of thousands of dollars and less, down to hundreds of dollars for small genomes such as bacteria and viruses.
Illumina sequencing platform has been proved as a reliable system to generate low error-rate bases in reads that can be as 250bp long in their high-throughput systems. However, the sequencing of such short fragments alone would not be able to resolve complex and repetitive region of the genome, where repeats are longer than reads. A common and cost-effective approach is to generate the so-called “mate-pairs” libraries: paired reads that are separated by a longer range of distance, up to 20Kb. These allow to link region of the assembly which are separated by repetitive regions that short reads are unable to resolve.
Newer technologies have opened possibilities to improve the reconstruction of complex genomes, such as those of human, animal and plants. The key aspects to pursue nearly complete and exhaustive genome reconstruction is the ability to resolve repeats and distinguish between alleles and/or homolog regions in polyploid species. Such features bring complexity during the assembly procedure which is impossible to disentangle with short sequence information. The only way to resolve this puzzle is to gather contiguity (as done by mate-pairs to a limited extent) in order to “walk” from one side to another of such ambiguous regions (repeats) or to phase sequence of the same chromosome and separate haplotypes accordingly.
Linked reads are regular short reads generated after library preparation on the Chromium system
This information can be obtained by sequencing technologies able to produce long reads (Pacific Bioscience, Oxford Nanopore) or by methods that, relying on accuracy of short reads sequencing, implement aggregated information on long molecules up to 100kbp (Linked reads - 10X Genomics) by massive barcoding. Reads carrying the same barcode originate from the few long molecules of DNA, each coming from a single allele. Sequences can be mapped against a reference genome to phase all the SNP and indel variants with long contiguity. On a de novo assembly graph, this method allows to disentangle ambiguous parts along with the ability to generate a reference sequence for each of the two haplotypes.
IGATech has recently acquired 10X Genomics Chromium technology to allow its clients to represent haplotypes faithfully compared to traditional haploid consensus assembly models. This technology enables resolution of repeat gaps and phase assemblies. The Chromium de novo Assembly Solution, enables phased de novo assembly at low cost, supplanting complex, expensive laboratory and computational workflows, and creating the potential to replace standard reference alignment methods.
Every species comes with its caveats. We can offer support from HMW isolation to high-density genetic maps for genome anchoring.
IGATech has long standing experience in genome assembly, always adopting the latest technologies, protocols and methods. Our mission is to provide consultancy and expertise to our customer in order to achieve the best and the most convenient strategy for de novo assembly projects. Whether relying on Illumina-only sequencing or adopting long-spanning information such as Pacific Bioscience or 10X Genomics, we provide bioinformatics support from the assembly exercise to genome annotation and interpretation. When genomes are very big and complex, making the sequencing with long reads unaffordable, our partnership with NRGene will provide you an ace in the hole to reach high quality assembly based on short reads (read more). IGATech is equipped with latest instruments to provide state-of-art sequencing solutions.
I am very pleased with the work you have been doing and your service minded attitude towards this project, a non-standard project that needs a customized approach.
- Preparation and sequencing of paired-end libraries from 125bp to 250bp
- Preparation and sequencing of mate-pair libraries (3kbp to 20kb)
- Preparation and sequencing of GemCode libraries (10X Genomics, Chromium system)
- Complete or hybrid project management with SMRT libraries on RSII/Sequel system
- End-to-end genome assembly projects with NRGene
- Genome assembly with state-of-art algorithms (kmer-based, OLC or hybrid)
- Genome anchoring via genetic maps (ddRAD)
- Genome anchoring with Hi-C technology
- Gene prediction and annotation
- Comparative genomics (synteny, pan-genome analysis)