Generating high-quality finished genomes replete with accurate identification of structural variation and high completion (minimal gaps) remains challenging using short read sequencing technologies alone. The Saphyr™ Genome Imaging System provides direct visualization of long DNA molecules in their native state, bypassing the statistical inference needed to align paired-end reads with an uncertain insert size distribution. These long labeled molecules are de novo assembled into physical maps spanning the entire diploid genome. The resulting provides the ability to correctly position and orient sequence contigs into chromosome-scale scaffolds and detect a large range of homozygous and heterozygous structural variation with very high efficiency.
Tumors are often comprised of heterogeneous populations of cells, with certain cancer-driving mutations at low allele fractions in early stages of cancer development. Effective detection of such variants is critical for diagnosis and targeted treatment. However, typical short read sequencing is expensive at coverage depths needed for detection of variants in rare clones. Short read sequencing is also limited in its ability to span across repeats in the genome and this results in high error rates in structural variant (SV) analysis. Based on specific labeling and mapping of ultra-high molecular weight DNA, we have developed a single-molecule platform that is able to detect disease-relevant SVs and give a high-resolution view of tumor heterogeneity. We have developed a pipeline that effectively detects structural variants at low allele fractions. It includes single-molecule based SV calling and fractional copy number analysis. Preliminary analyses using simulated data and well-characterized cancer samples showed high sensitivity for variants of different types at as low as 5% allele fractions with reasonable genomic coverage easily collectable on a Bionano Saphyr Chip. The candidate variants are annotated and further prioritized based on control data and publically available annotations such as DGV and dbVar. The data are imported into a graphical user interface tool that includes new visualization features (such as dynamic variant filtering, Circos diagrams, and report generation) for interactive visualization and curation. Together, these components allow for efficient analysis of cancer genomes of interest.