Debiasing Counterfactuals In the Presence of Spurious Correlations
This addresses a critical issue in medical AI by improving model robustness and interpretability for healthcare applications, though it is an incremental advance combining existing techniques.
The paper tackles the problem of deep learning models in medical imaging relying on spurious correlations rather than causal markers, limiting generalization, by introducing an end-to-end training framework that integrates debiasing classifiers and counterfactual image generation to learn generalizable markers and ignore spurious correlations.
Deep learning models can perform well in complex medical imaging classification tasks, even when basing their conclusions on spurious correlations (i.e. confounders), should they be prevalent in the training dataset, rather than on the causal image markers of interest. This would thereby limit their ability to generalize across the population. Explainability based on counterfactual image generation can be used to expose the confounders but does not provide a strategy to mitigate the bias. In this work, we introduce the first end-to-end training framework that integrates both (i) popular debiasing classifiers (e.g. distributionally robust optimization (DRO)) to avoid latching onto the spurious correlations and (ii) counterfactual image generation to unveil generalizable imaging markers of relevance to the task. Additionally, we propose a novel metric, Spurious Correlation Latching Score (SCLS), to quantify the extent of the classifier reliance on the spurious correlation as exposed by the counterfactual images. Through comprehensive experiments on two public datasets (with the simulated and real visual artifacts), we demonstrate that the debiasing method: (i) learns generalizable markers across the population, and (ii) successfully ignores spurious correlations and focuses on the underlying disease pathology.