LGOct 24, 2023

Robust Learning via Conditional Prevalence Adjustment

Minh Nguyen, Alan Q. Wang, Heejong Kim, Mert R. Sabuncu

arXiv:2310.15766v15.32 citationsh-index: 66Has Code

Originality Incremental advance

AI Analysis

This addresses robustness issues in healthcare AI for anti-causal tasks, but it is incremental as it builds on existing methods with specific assumptions.

The paper tackles the problem of deep learning models failing on unseen healthcare sites due to unstable correlations between confounding variables, proposing CoPA for anti-causal tasks and showing it outperforms baselines in experiments.

Healthcare data often come from multiple sites in which the correlations between confounding variables can vary widely. If deep learning models exploit these unstable correlations, they might fail catastrophically in unseen sites. Although many methods have been proposed to tackle unstable correlations, each has its limitations. For example, adversarial training forces models to completely ignore unstable correlations, but doing so may lead to poor predictive performance. Other methods (e.g. Invariant risk minimization [4]) try to learn domain-invariant representations that rely only on stable associations by assuming a causal data-generating process (input X causes class label Y ). Thus, they may be ineffective for anti-causal tasks (Y causes X), which are common in computer vision. We propose a method called CoPA (Conditional Prevalence-Adjustment) for anti-causal tasks. CoPA assumes that (1) generation mechanism is stable, i.e. label Y and confounding variable(s) Z generate X, and (2) the unstable conditional prevalence in each site E fully accounts for the unstable correlations between X and Y . Our crucial observation is that confounding variables are routinely recorded in healthcare settings and the prevalence can be readily estimated, for example, from a set of (Y, Z) samples (no need for corresponding samples of X). CoPA can work even if there is a single training site, a scenario which is often overlooked by existing methods. Our experiments on synthetic and real data show CoPA beating competitive baselines.

View on arXiv PDF Code

Similar