IVCVLGMEDec 9, 2025

Causal Attribution of Model Performance Gaps in Medical Imaging Under Distribution Shifts

arXiv:2512.09094v1h-index: 30
Originality Incremental advance
AI Analysis

This work addresses the issue of model performance degradation in medical imaging for practitioners, enabling targeted interventions based on deployment context, though it is incremental as it extends existing causal frameworks to a specific domain.

The paper tackled the problem of understanding causal mechanisms behind performance drops in medical image segmentation models under distribution shifts, by extending causal attribution frameworks to quantify contributions from acquisition protocols and annotation variability, revealing that annotation shifts dominate across annotators (7.4% ± 8.9% DSC attribution) and acquisition shifts dominate across imaging centers (6.5% ± 9.1% DSC attribution).

Deep learning models for medical image segmentation suffer significant performance drops due to distribution shifts, but the causal mechanisms behind these drops remain poorly understood. We extend causal attribution frameworks to high-dimensional segmentation tasks, quantifying how acquisition protocols and annotation variability independently contribute to performance degradation. We model the data-generating process through a causal graph and employ Shapley values to fairly attribute performance changes to individual mechanisms. Our framework addresses unique challenges in medical imaging: high-dimensional outputs, limited samples, and complex mechanism interactions. Validation on multiple sclerosis (MS) lesion segmentation across 4 centers and 7 annotators reveals context-dependent failure modes: annotation protocol shifts dominate when crossing annotators (7.4% $\pm$ 8.9% DSC attribution), while acquisition shifts dominate when crossing imaging centers (6.5% $\pm$ 9.1%). This mechanism-specific quantification enables practitioners to prioritize targeted interventions based on deployment context.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes