The eukaryotic genome exhibits various repetitive sequences (such as centromeres, telomeres, etc.) and long gaps, which increasingly highlight the shortcomings of existing sequencing technologies. Although second-generation sequencing technologies can cheaply generate a large amount of sequencing data, their read length limitations prevent a simple and intuitive view of structural changes across the entire genome; PacBio third-generation sequencing technology, with its long-read advantages, greatly compensates for the issues caused by the short reads of second-generation sequencing, such as the assembly of repetitive sequences and the identification of structural variations. However, both NGS and PacBio face challenges posed by repetitive regions, differing only in the relative lengths of the obstructing repeat or gap regions and the extent of their impact. Moreover, although fragmented information can piece together a complete genome representation, current sequencing technologies cannot completely overcome the limitations of purely using sequencing data for assembly, often resulting in “beautiful ideals.” Therefore, an auxiliary technology is needed to rescue the masses from peril, and this is the star of this issue: single-molecule optical mapping technology.
Decoding Life | The Past and Present of DNA Sequencing Technologies
Single-molecule optical mapping of the genome is derived from the ordered whole-genome restriction endonuclease cleavage site maps of individual DNA molecules. It provides macro-framework support, reflecting the structural information of the entire genome, which can assist in ensuring the accuracy and authenticity of assembly results, approaching the true “Mona Lisa.” Currently, single-molecule optical mapping technology is mainly applied in assisting genome assembly (hybrid scaffolding) and large-scale structural variation detection (large structure variation).
The main providers of single-molecule optical mapping technology are OpGen and BioNano. OpGen launched the Argus system in 2010, which uses restriction endonucleases to in situ cut single-molecule DNA fixed in the MapCard DNA surface area, maintaining the sequence of the cut DNA fragments. After staining the DNA fragments with fluorescent dyes, they are placed under a fluorescence microscope to collect information on the size and order of each restriction endonuclease fragment, resulting in a whole-genome restriction endonuclease cleavage site map based on overlap assembly. The Irys/Saphyr system launched by BioNano utilizes endonucleases to recognize and cleave DNA, labeling it with fluorescence, and then employs ultra-fine capillary electrophoresis to linearize the DNA molecules, conducting ultra-long single-molecule high-resolution fluorescence imaging, thereby generating an enzyme cleavage site distribution map. Unfortunately, OpGen has largely exited the market, and currently, the only observable single-molecule physical optical mapping technology is from BioNano.
BioNano’s single-molecule physical optical mapping technology utilizes single-strand cutting enzymes to “cut” the phosphodiester bonds at specific sites on gDNA (the DNA molecule is not broken). Under the action of DNA polymerase with strand displacement characteristics, fluorescently labeled bases are introduced, and ligases fill the gaps; simultaneously, the entire DNA is stained.
Image Source: genomap-tech
High-throughput and high-resolution single-molecule imaging of long-chain DNA molecules results in fragment information with extreme lengths of hundreds of Kb or even Mb, containing highly valuable structural information, namely the enzyme cleavage sites. With sufficient sequencing depth, a preliminary sketch of the genome can be obtained.
Image Source: genomap-tech
The advantages of optical mapping lie in its lack of sequence bias and ultra-long read lengths, which can completely span repetitive units and variable regions. For instance, using optical mapping technology, the full length and gap sizes of the human MHC in the genome can be accurately determined. It has significant technical advantages for de novo genome sequencing assembly and large fragment gene structure variation analysis.
One important application of optical mapping is assisting genome assembly. It is well known that a drawback of short-read assembly from second-generation sequencing is that the results are too fragmented, leading to assembly results that may deviate from reality. However, through the Irys/Saphyr system, the original scaffold can be anchored to the optical map, thereby constructing longer super-scaffolds, making the assembly results closer to the actual situation.
Optical mapping can better maintain the sequence order of low-coverage NGS data. Experiments have shown that the combination of optical mapping with 50x NGS assembly yields better results than 80x NGS alone. Correcting NGS data with optical mapping results in more reliable assembly outcomes. Optical mapping is increasingly combined with third-generation sequencing and is considered the best combination for current de novo genomes, including the reference genome of Chinese individuals published by Jinan University [1] and the Korean genome published by Seoul National University [2], both of which apply the combination of these two technologies.
Increasing evidence suggests that human whole-genome de novo assembly is becoming increasingly important, while resequencing comparisons lose too much valuable information [3]. Scientists are attempting to use a combination of optical mapping and 10XG Chromium technology to obtain high-quality personal de novo genomes at a relatively low cost [4]. This makes the era of personal genomics feasible.
Another important application of optical mapping is structural variation analysis. It is generally believed that high-quality genome analysis of structural variations is necessary in the future, rather than being limited to SNP analysis. Genomic structural variations (structure variation, SV) typically refer to DNA segment deletions, insertions, duplications, inversions, translocations, and changes in DNA copy number (CNVs) greater than 1 kb within the genome. Some scholars define structural variations as variations over 50bp, while defining structural variations over 1kb as large-scale structural variations (Large SV).
SV plays an important role in human genetic diversity and disease susceptibility. For example, the duplication of the PLP1 gene is associated with Pelizaeus-Merzbacher disease, the diversity of SV pathogenic mechanisms in CMT disease, and the deletion of the EPAS1 gene flanking regions related to Tibetan plateau adaptation. The Irys system discovered over 600 large structural variations greater than 1kb within a single human genome, most of which affect protein coding [5]. Furthermore, optical mapping technology comprehensively presents the structural variation inheritance map of families (NA12878/891/892), discovering a large number of unreported structural variations and demonstrating significant advantages of optical mapping technology in structural variation analysis [6]. Of course, to comprehensively obtain all variation information, multiple methods such as optical mapping, Hi-C, and sequencing (third-generation/second-generation) need to be combined [7].
Structural variations are increasingly being recognized as biomarkers for complex diseases. First, for genetic diseases, currently, only about 26% of genetic diseases have known gene loci [8]. The vast majority of genetic diseases do not have clearly defined loci, and they may not be determined by SNPs; they may be caused by structural variations. To this end, institutions like UCLA specifically study undiagnosed rare genetic diseases, using optical mapping and other technologies to investigate whether these rare diseases are caused by genomic structural variations. One of their findings is that structural variations can serve as biomarkers for Duchenne Muscular Dystrophy (DMD). One structural variation is an inversion of up to 5.1Mb, which is the first application of optical mapping technology in diagnosing diseases within a population cohort. This also confirms the value of optical mapping in discovering disease biomarkers.
Additionally, in prostate cancer research, optical mapping technology has uncovered previously unreported structural variations leading to gene fusions [9]. The author analyzed a prostate cancer patient with obvious familial inheritance but known biomarker negative (SPOP, FOXA1, IDH1) using optical mapping and identified 85 large-scale structural variations, of which 37 affected tumor genes. Among them, a 14.3 Kb deletion between DUSP1 and C2orf78 was found, leading to gene fusion. This may serve as a new diagnostic and therapeutic target for prostate cancer.
Optical mapping technology possesses inherent advantages in personal genome assembly and large-scale structural variation screening, aiding the accuracy of raw data and the differentiated development of the industry, thus promoting the advancement and research application of precision medicine.
References:
[1] Shi L, Guo Y, Dong C, Huddleston J, Yang H, Han X, et al. Long-read sequencing and de novo assembly of a Chinese genome. Nat Commun 2016;7:12065. [2] Seo J-S, Rhie A, Kim J, Lee S, Sohn M-H, Kim C-U, et al. De novo assembly and phasing of a Korean human genome. Nature 2016;538:243-7. [3] Li Y, Zheng H, Luo R, Wu H, Zhu H, Li R, et al. Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly. Nat Biotechnol 2011;29:723-30. [4] Mohr DW, Naguib A, Weisenfeld N, Kumar V, Shah P, Church DM, et al. Improved de novo Genome Assembly: Linked-Read Sequencing Combined with Optical Mapping Produce a High Quality Mammalian Genome at Relatively Low Cost. bioRxiv 2017. [5] Cao H, Hastie AR, Cao D, Lam ET, Sun Y, Huang H, et al. Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology. GigaScience 2014;3:34. [6] Mak AC, Lai YY, Lam ET, Kwok TP, Leung AK, Poon A, et al. Genome-Wide Structural Variation Detection by Genome Mapping on Nanochannel Arrays. Genetics 2016;202:351-62. [7] Dixon J, Xu J, Dileep V, Zhan Y, Song F, Le VT, et al. An Integrative Framework For Detecting Structural Variations In Cancer Genomes. bioRxiv 2017. [8] Lee H, Deignan JL, Dorrani N, Strom SP, Kantarci S, Quintero-Rivera F, et al. Clinical Exome Sequencing for Genetic Identification of Rare Mendelian Disorders. JAMA 2014;312:1880-7. [9] Jaratlerdsiri W, Chan EKF, Petersen DC, Yang C, Croucher PI, Bornman MSR, et al. Next generation mapping reveals novel large genomic rearrangements in prostate cancer. Oncotarget 2017;8:23588-602.
Join the official WeChat group, remark:
Name – Institution/School – Research Direction
·END·
Easy Research | Fun Reading Literature | Cutting-Edge Information | Entertaining Science Popularization
