LGMar 8, 2023

Deep Hypothesis Tests Detect Clinically Relevant Subgroup Shifts in Medical Images

Lisa M. Koch, Christian M. Schürch, Christian F. Baumgartner, Arthur Gretton, Philipp Berens

arXiv:2303.04862v13.82 citationsh-index: 66Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the issue of undetected distribution shifts for safe ML deployment in healthcare, though it is incremental as it applies existing methods to a specific domain.

The paper tackled the problem of detecting subgroup shifts in medical images, which can cause performance drops in deployed machine learning systems, and demonstrated that state-of-the-art statistical tests effectively detect these shifts in synthetic and real-world datasets like histopathology and retinal fundus images.

Distribution shifts remain a fundamental problem for the safe application of machine learning systems. If undetected, they may impact the real-world performance of such systems or will at least render original performance claims invalid. In this paper, we focus on the detection of subgroup shifts, a type of distribution shift that can occur when subgroups have a different prevalence during validation compared to the deployment setting. For example, algorithms developed on data from various acquisition settings may be predominantly applied in hospitals with lower quality data acquisition, leading to an inadvertent performance drop. We formulate subgroup shift detection in the framework of statistical hypothesis testing and show that recent state-of-the-art statistical tests can be effectively applied to subgroup shift detection on medical imaging data. We provide synthetic experiments as well as extensive evaluation on clinically meaningful subgroup shifts on histopathology as well as retinal fundus images. We conclude that classifier-based subgroup shift detection tests could be a particularly useful tool for post-market surveillance of deployed ML systems.

View on arXiv PDF Code

Similar