MLOct 13, 2016

Removal of Batch Effects using Distribution-Matching Residual Networks

arXiv:1610.04181v6177 citations
Originality Incremental advance
AI Analysis

This addresses the issue of measurement errors that can skew statistical analysis in biological research, though it appears incremental as it builds on existing distribution-matching techniques with a deep learning twist.

The paper tackles the problem of systematic batch effects in biological data from technologies like mass cytometry and single-cell RNA-seq, proposing a deep learning approach using a residual network trained with Maximum Mean Discrepancy to remove these effects, and demonstrates its effectiveness on real datasets.

Sources of variability in experimentally derived data include measurement error in addition to the physical phenomena of interest. This measurement error is a combination of systematic components, originating from the measuring instrument, and random measurement errors. Several novel biological technologies, such as mass cytometry and single-cell RNA-seq, are plagued with systematic errors that may severely affect statistical analysis if the data is not properly calibrated. We propose a novel deep learning approach for removing systematic batch effects. Our method is based on a residual network, trained to minimize the Maximum Mean Discrepancy (MMD) between the multivariate distributions of two replicates, measured in different batches. We apply our method to mass cytometry and single-cell RNA-seq datasets, and demonstrate that it effectively attenuates batch effects.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes