IV CV LG MEDec 9, 2025

Causal Attribution of Model Performance Gaps in Medical Imaging Under Distribution Shifts

Pedro M. Gordaliza, Nataliia Molchanova, Jaume Banus, Thomas Sanchez, Meritxell Bach Cuadra

arXiv:2512.09094v15.1h-index: 30

Originality Incremental advance

AI Analysis

This work addresses the issue of model performance degradation in medical imaging for practitioners, enabling targeted interventions based on deployment context, though it is incremental as it extends existing causal frameworks to a specific domain.

The paper tackled the problem of understanding causal mechanisms behind performance drops in medical image segmentation models under distribution shifts, by extending causal attribution frameworks to quantify contributions from acquisition protocols and annotation variability, revealing that annotation shifts dominate across annotators (7.4% ± 8.9% DSC attribution) and acquisition shifts dominate across imaging centers (6.5% ± 9.1% DSC attribution).

Deep learning models for medical image segmentation suffer significant performance drops due to distribution shifts, but the causal mechanisms behind these drops remain poorly understood. We extend causal attribution frameworks to high-dimensional segmentation tasks, quantifying how acquisition protocols and annotation variability independently contribute to performance degradation. We model the data-generating process through a causal graph and employ Shapley values to fairly attribute performance changes to individual mechanisms. Our framework addresses unique challenges in medical imaging: high-dimensional outputs, limited samples, and complex mechanism interactions. Validation on multiple sclerosis (MS) lesion segmentation across 4 centers and 7 annotators reveals context-dependent failure modes: annotation protocol shifts dominate when crossing annotators (7.4% $\pm$ 8.9% DSC attribution), while acquisition shifts dominate when crossing imaging centers (6.5% $\pm$ 9.1%). This mechanism-specific quantification enables practitioners to prioritize targeted interventions based on deployment context.

View on arXiv PDF

Similar