LGSTQMJul 21, 2021

Preventing dataset shift from breaking machine-learning biomarkers

arXiv:2107.09947v183 citations
Originality Synthesis-oriented
AI Analysis

This work tackles dataset shift issues for biomedical researchers using ML biomarkers, but it is incremental as it offers an overview rather than a novel solution.

The paper addresses the problem of dataset shift undermining machine-learning biomarkers in biomedical research, providing an overview of detection and correction strategies to improve reliability.

Machine learning brings the hope of finding new biomarkers extracted from cohorts with rich biomedical measurements. A good biomarker is one that gives reliable detection of the corresponding condition. However, biomarkers are often extracted from a cohort that differs from the target population. Such a mismatch, known as a dataset shift, can undermine the application of the biomarker to new individuals. Dataset shifts are frequent in biomedical research, e.g. because of recruitment biases. When a dataset shift occurs, standard machine-learning techniques do not suffice to extract and validate biomarkers. This article provides an overview of when and how dataset shifts breaks machine-learning extracted biomarkers, as well as detection and correction strategies.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes