GENOME PLATFORM: An affordable and scalable genomic platform for biomarker monitoring in cancer treatment

The project is financed by Region Friuli Venezia Giulia - PR FESR 2021-2027 with a contribution of 211.310,02 €. CUP code – D29J23000010007

In collaboration with:

CRO - Centro di Riferimento Oncologico, Aviano
Dipartimento di Area Medica, Università degli Studi di Udine

The Critical Role of Genetic Screening in Oncology

In the field of oncology, the ability to detect and monitor genetic mutations plays a critical role in advancing cancer treatment and improving patient outcomes. Genetic screening, particularly through advanced sequencing technologies, has become a cornerstone in the development of precision medicine, enabling tailored therapies that target specific genetic alterations in tumors.

Genetic Screening and Tumor Mutation Monitoring

Genetic screening involves analyzing DNA to identify mutations that drive cancer progression. These mutations can serve as biomarkers, providing valuable information about the tumor's characteristics and behavior. By identifying these genetic alterations, clinicians can better understand the molecular mechanisms underlying each patient's cancer, leading to more effective treatment strategies. Monitoring tumor mutations over time is particularly important in clinical trials, where longitudinal studies can track how tumors evolve in response to treatment. This ongoing surveillance allows researchers and clinicians to observe the emergence of new mutations, assess the effectiveness of therapies, and make informed decisions about treatment adjustments.

Longitudinal Studies and Precision Medicine

Longitudinal studies involve repeatedly measuring the same variables over long periods, providing a comprehensive view of how cancer and its treatment progress. In the context of genetic screening, longitudinal studies enable the continuous monitoring of tumor DNA, capturing the dynamic changes that occur as a result of therapy or disease progression. One of the primary benefits of longitudinal genetic screening is its ability to detect minimal residual disease (MRD) and early signs of relapse. By analyzing circulating tumor DNA (ctDNA) in blood samples, clinicians can identify tumor-derived genetic material at very low concentrations. This sensitivity allows for the detection of mutations that might indicate resistance to current treatments, prompting timely interventions before the disease advances. Precision medicine relies heavily on this real-time data to tailor treatments to the individual needs of patients. Genetic screening and monitoring provide a detailed map of the tumor's genetic landscape, guiding the selection of therapies that are most likely to be effective based on the specific mutations present. This approach minimizes the trial-and-error aspect of cancer treatment, reducing unnecessary side effects and improving overall patient outcomes.

The Importance of Economical Sequencing

To achieve these benefits on a large scale, it is essential to develop an economical and sustainable system for genetic screening. The project aims to create a targeted sequencing method starting from liquid and tissue biopsies that can drastically reduce analysis difficulties and operational costs. This advancement would make genomic data far more accessible, leading to the democratization of precision medicine. Technically, the goal is to improve DNA sequencing technology to accelerate the availability of genomic data in precision oncology pathways, thus supporting therapeutic decision-making processes. The primary objective is to create a method that significantly lowers costs, enabling constant monitoring of responses to anti-tumor therapies. This method aims to improve sensitivity, reduce analytical times, and enhance the ability to identify mutations critical for guiding therapy. Additionally, the economic feasibility of this technique could support post-therapy monitoring programs to evaluate minimal residual disease, provided the technique achieves sufficient sensitivity.

Liquid Biopsy: A Less Invasive Approach

A notable feature of this new methodology is its applicability in analyzing liquid biopsies. Unlike tissue biopsies, which involve surgically removing an accessible tumor mass, liquid biopsies are performed through a simple venous blood draw. Blood contains cell-free DNA (cfDNA), released by cells following cell death or apoptosis. Tumors also release genetic material into the bloodstream, known as circulating tumor DNA (ctDNA). While tissue biopsy remains the current standard, it poses several challenges, including technical, economic, and clinical issues, as well as quality-of-life impacts. Although easier to analyze genomically, tissue biopsy carries potential post-surgical complications, which can add costs to the healthcare system and pose risks to the patient. Conversely, liquid biopsy is minimally invasive and suitable even for pathologies in hard-to-reach sites (like lung adenocarcinoma), with significantly lower costs for collection and post-collection management.

Overall, the importance of genetic screening in monitoring tumor mutations cannot be overstated. It not only drives the evolution of precision medicine but also promises to significantly improve the management and treatment of cancer, leading to better patient outcomes and more efficient healthcare solutions. For these advancements to be effectively integrated into clinical practice, the development of highly sensitive and cost-effective assays is crucial. These assays need to be capable of detecting mutations present in less than 1% of circulating free DNA, ensuring they are affordable enough to be applied on a large scale, making advanced cancer monitoring accessible to a broader patient population.

Current Limitations in cfDNA Analysis through NGS

The analysis of circulating free DNA (cfDNA) through next-generation sequencing (NGS) presents several significant challenges that must be addressed to improve the accuracy and reliability of this method. One of the principal issues is the scarcity of genetic material. A typical blood draw yields approximately 10 ng of cfDNA, which is sufficient for NGS preparation but still limiting for developing complex preparation systems that involve multiple purification steps and multi-stage manipulation.

To achieve a native sensitivity of at least 1% for detecting a mutation against a background of 99% "healthy" genetic material, a sequencing coverage of over 5000X per site is required. However, at these levels of coverage, artifacts become prevalent. These artifacts include sequencing errors and artifacts introduced during the PCR phase. While the former issue can be somewhat mitigated by the release of increasingly accurate sequencing platforms, the latter remains a significant challenge.

Recent advancements in Taq polymerases have provided higher levels of fidelity in their proofreading capabilities, but this does not completely eliminate the problem. Errors introduced in the early cycles of amplification can be propagated, making them indistinguishable from true low-frequency mutations.

Scarcity of Genetic Material

The limited quantity of cfDNA extracted from a typical blood draw poses a significant constraint. With approximately 10 ng of cfDNA available, researchers face limitations in developing sophisticated preparation systems that require multiple purification steps. This limitation hinders the ability to manipulate the cfDNA at multiple stages, which is often necessary to ensure high fidelity in downstream analyses.

High Coverage Requirements and Associated Artifacts

To detect mutations at a frequency as low as 1%, a sequencing depth of over 5000X is necessary. Such high coverage is essential to ensure that rare mutations are reliably detected against the vast majority of wild-type sequences. However, at this level of coverage, sequencing errors and PCR artifacts become prominent.

Sequencing Errors: Advances in sequencing technologies have significantly improved the accuracy of platforms, reducing the overall error rate. Nevertheless, the need for extremely high coverage amplifies the impact of any remaining sequencing errors, complicating the differentiation between true mutations and artifacts.

PCR-Induced Artifacts: PCR amplification, a necessary step in NGS library preparation, introduces errors that can be propagated through subsequent cycles. Even with high-fidelity polymerases that possess excellent proofreading abilities, errors that occur in the initial cycles can be exponentially amplified, making them difficult to distinguish from genuine low-frequency mutations. This challenge underscores the necessity for improved methods to minimize or correct for these errors during the amplification process.

Rolling Circle Amplification: Technique and Applications for cfDNA

Rolling Circle Amplification (RCA) is an isothermal nucleic acid amplification technique that efficiently amplifies circular DNA molecules. RCA uses a short DNA or RNA primer to initiate DNA synthesis at a specific site on a circular template. A DNA polymerase with strong strand displacement activity then extends the primer, continuously synthesizing a long single-stranded DNA (ssDNA) composed of tandem repeats of the circular template sequence. This method can produce large quantities of DNA from minimal starting material, making it particularly suitable for applications where sample quantity is limited.

The utility of RCA for cfDNA amplification stems from the inherent characteristics of cfDNA itself. cfDNA typically presents as short fragments in the bloodstream, with an average length of 100-120 base pairs (bp). These fragments are released into the circulation during cell death processes such as apoptosis and necrosis, and they contain valuable genetic information from both normal and tumor cells.

To utilize RCA for cfDNA, the linear cfDNA fragments must first be circularized. This can be achieved by using ligases that join the ends of the cfDNA fragments to form circular DNA molecules. This step is critical because RCA specifically amplifies circular templates. Once the cfDNA is circularized, a short primer complementary to a specific region of the circularized cfDNA is annealed to the template. This primer provides the starting point for the DNA polymerase to begin synthesis.

Figure 1. Schematic representation of rolling circle (isothermal) amplification or RCA. Credit: Berr, Alexandre. (2006). Karyotype evolution and nuclear organization across the genus Arabidopsis.

Using a DNA polymerase with strong strand displacement activity, such as Phi29 DNA polymerase, the primer is extended around the circular template. The polymerase continuously displaces the newly synthesized strand, creating long ssDNA with repeated sequences of the original circular template. The amplified product, consisting of long ssDNA molecules, can then be converted back into double-stranded DNA (dsDNA) if necessary, using complementary primers and DNA polymerases.

RCA offers several benefits for cfDNA analysis. First, it can amplify minute amounts of cfDNA into quantities sufficient for downstream analyses, such as next-generation sequencing (NGS). This is particularly important given the limited quantity of cfDNA typically available from blood samples. Second, RCA operates under isothermal conditions, avoiding the need for thermal cycling required in PCR. This simplifies the amplification process and reduces the potential for introducing thermal cycling artifacts. Finally, by designing specific primers, RCA can selectively amplify target cfDNA sequences, enhancing the detection of specific mutations or biomarkers present at low frequencies.

In summary, RCA provides a powerful and efficient method for amplifying cfDNA, which is often present in small, fragmented forms in the bloodstream. By leveraging the natural fragmentation of cfDNA and the high amplification capacity of RCA, researchers can obtain sufficient genetic material for detailed genomic analyses. This improves the sensitivity and accuracy of cfDNA-based diagnostic and monitoring applications in oncology.

Duplex-seq Methodology: Detailed Protocol and Functioning

The Duplex-seq (Duplex Sequencing) methodology is an advanced sequencing technique designed to enhance the accuracy of mutation detection by differentiating between true variants and sequencing artifacts. This method utilizes unique molecular identifiers (UMIs) and complementary DNA strands to ensure high fidelity in sequencing results.

In the Duplex-seq protocol, DNA is first fragmented, and adapters containing UMIs are ligated to both ends of each fragment. These UMIs serve as unique tags that allow for the tracking of individual DNA molecules throughout the sequencing process. After adapter ligation, the DNA is denatured to separate the complementary strands, which are then individually PCR amplified. The resulting PCR products are sequenced, and the reads are aligned to the reference genome.

A critical aspect of Duplex-seq is its use of paired reads from both strands of the DNA duplex. By comparing the sequences of the complementary strands, true mutations can be distinguished from errors introduced during PCR amplification or sequencing. This is because a true mutation will be present in both strands at the same genomic position, whereas an artifact will typically only appear in one strand.

To enrich specific loci, targeted primers are designed to hybridize to the regions of interest. These primers are used in a targeted PCR step, which amplifies only the regions flanked by the primers, thereby increasing the coverage and depth of sequencing for these specific loci. This targeted enrichment is crucial for achieving the high sensitivity required to detect low-frequency mutations, such as those present in less than 1% of circulating free DNA (cfDNA).

The Duplex-seq process involves several key steps:

Fragmentation and Adapter Ligation: DNA is fragmented, and adapters with UMIs are ligated to the fragments.
Denaturation and PCR Amplification: The DNA is denatured to separate the strands, which are then PCR amplified separately.
Sequencing: The amplified products are sequenced, generating paired-end reads.
Alignment and Error Correction: The paired reads from both DNA strands are aligned to the reference genome. True mutations are identified by comparing the complementary strand sequences, ensuring that only mutations present in both strands are considered true variants.

By leveraging the accuracy of UMIs and the comparative analysis of complementary DNA strands, Duplex-seq provides a highly reliable method for detecting mutations with unprecedented sensitivity.

Figure 2. A Duplex adapter is generated by synthesis of a complementary strand and generation of complimentary UMI to be paired from two PCR products originated from the same original template molecule while distinguishing their fragment (top or bottom) by an heteroduplex of 2bp in the middle of the conserved region. Credit: Peng et al. 2019. Targeted single primer enrichment sequencing with single end Duplex-UMI. Scientific Reports.

Figure 3. Once duplex adapter are ligated to the DNA, an amplification reaction is carried with the help of locus specific primer pool, aimed to enrich for the target loci of the panel. Credit: Peng et al. 2019. Targeted single primer enrichment sequencing with single end Duplex-UMI. Scientific Reports.

Results

We developed a protocol to target (as of today) a panel of 149 actionable mutations from a set of 44 oncogenes. The protocol il depicted in Figure 3. However, each locus is sequenced by the average length of cfDNA fragments and so about 110bp can be explored around the target mutations allowing for the discovery of novel mutations.

A dedicated bioinformatics pipelines has been deployed to analyze the data and performed automated multi-consensus strategy. In brief an unsupervised call is provided if one replicate provides a call for such mutation via a duplex-only coverage and at least two replicates provided evidence of such mutated allele in simplex coverage.

Figure 4. Pipeline of the entire NGS workflow. On the left, the wet part starts from the generation of 3 aliquots of the same cfDNA extracted from blood. Each aliquot is therefore processed independently through all stages of library preparation and sequencing. At the analysis stage (right) each replica dataset is processed independently while at the end final calling is filtered based on a consensus strategy which requires only one duplex-call but also the presence of the mutation in at least two simplex datasets of the same sample.

By using RCA method, it was possible to obtain more than 200ng of template DNA for most samples starting from as little as 7ng. In Figure 4 a batch of 21 samples is reported showing the average performance with an average template amplification factor of 51x, ranging from 21X to 75X.

For each sample replicate about 100M reads (50M paired-end fragments) were generated.

The base composition plot reported in Figure 5 and 6 show ha the final library is built. In short, the R1 read starts with sequencing of 12bp of UMI, followed by a fixed sequence containing two (CC | TT) strand-specific bases. After the fixed adapter the genome template sequence, attached via ligation is present. In R2 read, wich is generated form the side were the locus-specific primers enrich for the desired targets the base composition is given by a 20bp linker sequence followed by the multiplex primer sequence. It possible to notice that hover a small fraction of fragments are non-canonical, possibly generated by the interaction of the many primers with a UMI sequence that given its random sequence can provide stochastic annealing sites. For this reason, a thorough series of clean-up steps was necessary to limit this phenomenon at most by removing all free adapters and primers at each step of the process.

The construction of duplex data is a loss-intensive analysis since only sequences capturing both UMI originating from the same DNA molecule can build a consensus duplex sequence. A high coverage of duplex sequences comes with trade-off. A high amount of input DNA generates a lower likelihood to find sister sequences in the pool; a low amount of input DNA, therefore less templates molecules, makes easier to find sister molecules once sequenced at the cost of lower library complexity and hence a final deduplicated coverage (i.e. inefficient usage of raw coverage). In other words, to capture all sister sequences a library must be sequenced upon its complexity is totally exhausted, meaning that the >99% of the data is duplicated readout. In our optimization process we obtain the best results with an input DNA of 200-400ng. With a panel of about 150 loci and 50M fragments sequenced, the protocol provided an average raw-to-duplex ratio of 6.5 ± 1.8%. To simplify, a raw amplicon coverage of 10000x on a given locus leads to a duplex coverage of 650x. An ideal duplex coverage would be 1000X to allow statistically rely on 10 duplex reads for the detection of a mutation present at 1% in the template DNA.

In this development we successfully developed an assay capable of detecting a panel of 150 actionable point mutations at 1% in the specimen, starting from 5-10ng of cfDNA from blood and requiring about 100M NGS sequencing reads. The size of the panel has shown to be flexible by simply including new primers in the pools.

The key benefits of this assay are:

Not bound to any reagent vendor, technology or patent.
Only requires access to a benchtop sequencer; being targeted it does not require generation of full deep genome coverage.
Sensitivity is down to 1% allelic fraction.