GNMLApr 14, 2016

Variational inference for rare variant detection in deep, heterogeneous next-generation sequencing data

arXiv:1604.04280v2
Originality Incremental advance
AI Analysis

This work addresses the challenge of accurately identifying rare variants in mixed biological samples, which is crucial for understanding genetic heterogeneity, but it is incremental as it builds on existing statistical methods with computational improvements.

The authors tackled the problem of detecting rare genetic variants in noisy, heterogeneous next-generation sequencing data by proposing a novel Bayesian model with a variational EM algorithm, achieving comparable sensitivity and specificity to MCMC methods while being more computationally efficient, with higher specificity than many state-of-the-art algorithms in tests on low-coverage data.

The detection of rare variants is important for understanding the genetic heterogeneity in mixed samples. Recently, next-generation sequencing (NGS) technologies have enabled the identification of single nucleotide variants (SNVs) in mixed samples with high resolution. Yet, the noise inherent in the biological processes involved in next-generation sequencing necessitates the use of statistical methods to identify true rare variants. We propose a novel Bayesian statistical model and a variational expectation-maximization (EM) algorithm to estimate non-reference allele frequency (NRAF) and identify SNVs in heterogeneous cell populations. We demonstrate that our variational EM algorithm has comparable sensitivity and specificity compared with a Markov Chain Monte Carlo (MCMC) sampling inference algorithm, and is more computationally efficient on tests of low coverage ($27\times$ and $298\times$) data. Furthermore, we show that our model with a variational EM inference algorithm has higher specificity than many state-of-the-art algorithms. In an analysis of a directed evolution longitudinal yeast data set, we are able to identify a time-series trend in non-reference allele frequency and detect novel variants that have not yet been reported. Our model also detects the emergence of a beneficial variant earlier than was previously shown, and a pair of concomitant variants.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes