The "HOLY grail"
In 2001, the human genome was published. Both private and public initiatives competed for this outcome, adopting different strategies. Common to both was the sequencing technology, based on Sanger chemistry, able to produce long reads (up to 800bp) with accurate base calling. The results, also given the efforts in the finishing steps, were highly accurate genome sequences that would be unbeatable by standard genome assemblies produced nowadays by “so-called” second-generation technologies.
With the establishment and continuous improvement of NGS technologies, the reconstruction of whole genomes has become an affordable task over the time. Depending on the complexity of genomes, different strategies and technologies can be leveraged to pursue a task that used to require hundreds of thousands of dollars’ budget; now this task become feasible in the order of tens of thousands of dollars and less, down to hundreds of dollars for small genomes such as bacteria and viruses.
Long range information provide independent assembly of haplotypes. WIthin-sample structural variation is now accessible with a single assembly exercise.
Illumina sequencing platform has been proved as a reliable system to generate low error-rate bases in reads that can be as 250bp long in their high-throughput systems. In order to ensure the ability to reconstruct the most of a genome, reads must be produced with some redundancy, usually in the order of 50 to 100 folds the genome size. This will lower the probability of uncovered regions and also provide enough overlap from a read to another to build contiguous sequences and scaffolds of nucleotides.
However, the sequencing of such short fragments alone would not be able to resolve complex and repetitive region of the genome, where repeats are longer than reads. A common and cost-effective approach is to generate the so-called “mate-pairs” libraries: paired reads that are separated by a longer range of distance, up to 20Kb. These allow to link region of the assembly which are separated by repetitive regions that short reads are unable to resolve. However, repetitiveness of complex eukaryote genomes hampers the reconstruction of long segments of the genomes, limiting the contiguity of information.
Newer technologies have opened possibilities to improve the reconstruction of complex genomes, such as those of human, animal and plants. The key aspects to pursue nearly complete and exhaustive genome reconstruction is the ability to resolve repeats and distinguish between alleles and/or homolog regions in polyploid species. Such features bring complexity during the assembly procedure which is impossible to disentangle with short sequence information. The only way to resolve this puzzle is to gather contiguity (as done by mate-pairs to a limited extent) in order to “walk” from one side to another of such ambiguous regions (repeats) or to phase sequence of the same chromosome and separate haplotypes accordingly. This information can be obtained by sequencing technologies able to produce long reads (Pacific Bioscience, Oxford Nanopore) or by methods that, relying on accuracy of short reads sequencing, implement aggregated information on long molecules up to 100kbp (10X Genomics) by massive barcoding.
Every species comes with its caveats. We can offer support from HMW isolation to high-density genetic maps for genome anchoring.
IGATech has long standing experience in genome assembly, always adopting the latest technologies, protocols and methods. Our mission is to provide consultancy and expertise to our customer in order to achieve the best and the most convenient strategy for de novo assembly projects. Whether relying on Illumina-only sequencing or adopting long-spanning information such as Pacific Bioscience or 10X Genomics, we provide bioinformatics support from the assembly exercise to genome annotation and interpretation. When genomes are very big and complex, making the sequencing with long reads unaffordable, our partnership with NRGene will provide you an ace in the hole to reach high quality assembly based on short reads (read more). IGATech is equipped with latest instruments to provide state-of-art sequencing solutions.
I am very pleased with the work you have been doing and your service minded attitude towards this project, a non-standard project that needs a customized approach.
- Preparation and sequencing of paired-end libraries from 125bp to 250bp
- Preparation and sequencing of mate-pair libraries (3kbp to 20kb)
- Preparation and sequencing of GemCode libraries (10X Genomics, Chromium system)
- Complete or hybrid project managment with SMRT libraries on RSII/Sequel system
- End-to-end genome assembly projects with NRGene
- Genome assembly with state-of-art algorithms (kmer-based, OLC or hybrid)
- Genome anchoring via genetic maps (ddRAD)
- Genome anchoring with Hi-C technology
- Gene prediction and annotation
- Comparative genomics (synteny, pan-genome analysis)