Understanding Silent Failures in Medical Image Classification
This addresses the critical issue of ensuring reliable classification systems in medical applications, which is essential for patient safety, though it is incremental as it builds on existing methods for failure detection.
The paper tackled the problem of silent failures in medical image classification by conducting a comprehensive analysis of confidence scoring functions under distribution shifts, finding that none reliably prevent failures, and introduced an interactive tool, SF-Visuals, to visualize and analyze these failures.
To ensure the reliable use of classification systems in medical applications, it is crucial to prevent silent failures. This can be achieved by either designing classifiers that are robust enough to avoid failures in the first place, or by detecting remaining failures using confidence scoring functions (CSFs). A predominant source of failures in image classification is distribution shifts between training data and deployment data. To understand the current state of silent failure prevention in medical imaging, we conduct the first comprehensive analysis comparing various CSFs in four biomedical tasks and a diverse range of distribution shifts. Based on the result that none of the benchmarked CSFs can reliably prevent silent failures, we conclude that a deeper understanding of the root causes of failures in the data is required. To facilitate this, we introduce SF-Visuals, an interactive analysis tool that uses latent space clustering to visualize shifts and failures. On the basis of various examples, we demonstrate how this tool can help researchers gain insight into the requirements for safe application of classification systems in the medical domain. The open-source benchmark and tool are at: https://github.com/IML-DKFZ/sf-visuals.