LGMLDec 12, 2018

Bridging the Generalization Gap: Training Robust Models on Confounded Biological Data

arXiv:1812.04778v14 citations
Originality Incremental advance
AI Analysis

This addresses generalization issues for researchers using biological data, but it is incremental as it builds on existing techniques like DANN.

The paper tackles the problem of poor generalization in statistical learning on biological data due to confounding variables, proposing methods like ONION and DANN to control for confounders, and shows significant improvements in generalization on simulated and empirical patient data.

Statistical learning on biological data can be challenging due to confounding variables in sample collection and processing. Confounders can cause models to generalize poorly and result in inaccurate prediction performance metrics if models are not validated thoroughly. In this paper, we propose methods to control for confounding factors and further improve prediction performance. We introduce OrthoNormal basis construction In cOnfounding factor Normalization (ONION) to remove confounding covariates and use the Domain-Adversarial Neural Network (DANN) to penalize models for encoding confounder information. We apply the proposed methods to simulated and empirical patient data and show significant improvements in generalization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes