GNLGMLNov 26, 2018

Interlacing Personal and Reference Genomes for Machine Learning Disease-Variant Detection

arXiv:1811.11674v12 citations
Originality Incremental advance
AI Analysis

This addresses the problem of enhancing clinical variant calling accuracy for medical applications, representing an incremental improvement over existing machine learning approaches.

The paper tackles the problem of improving genetic variant detection from DNA sequencing data by introducing a novel method that generates images interlacing personal and reference genomes to maximize sequencing read usage. The result shows improved performance in standard germline variant calling and extends to somatic variant calling with Siamese networks, with the method freely available for noncommercial use.

DNA sequencing to identify genetic variants is becoming increasingly valuable in clinical settings. Assessment of variants in such sequencing data is commonly implemented through Bayesian heuristic algorithms. Machine learning has shown great promise in improving on these variant calls, but the input for these is still a standardized "pile-up" image, which is not always best suited. In this paper, we present a novel method for generating images from DNA sequencing data, which interlaces the human reference genome with personalized sequencing output, to maximize usage of sequencing reads and improve machine learning algorithm performance. We demonstrate the success of this in improving standard germline variant calling. We also furthered this approach to include somatic variant calling across tumor/normal data with Siamese networks. These approaches can be used in machine learning applications on sequencing data with the hope of improving clinical outcomes, and are freely available for noncommercial use at www.ccg.ai.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes