LGQMAPMLJun 24, 2020

Bayesian Sampling Bias Correction: Training with the Right Loss Function

arXiv:2006.13798v1
Originality Incremental advance
AI Analysis

This addresses a common but often overlooked issue in medical imaging and other domains where sampling bias leads to discrepancies between lab and real-world performance, offering a systematic correction method.

The paper tackles the problem of sampling bias in training datasets, such as mismatched prevalence in medical imaging, by deriving a family of loss functions based on Bayesian risk minimization to correct this bias, resulting in improved model performance in realistic settings as demonstrated in lung nodule malignancy grading case studies.

We derive a family of loss functions to train models in the presence of sampling bias. Examples are when the prevalence of a pathology differs from its sampling rate in the training dataset, or when a machine learning practioner rebalances their training dataset. Sampling bias causes large discrepancies between model performance in the lab and in more realistic settings. It is omnipresent in medical imaging applications, yet is often overlooked at training time or addressed on an ad-hoc basis. Our approach is based on Bayesian risk minimization. For arbitrary likelihood models we derive the associated bias corrected loss for training, exhibiting a direct connection to information gain. The approach integrates seamlessly in the current paradigm of (deep) learning using stochastic backpropagation and naturally with Bayesian models. We illustrate the methodology on case studies of lung nodule malignancy grading.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes