Multi-pathology Chest X-ray Classification with Rejection Mechanisms
This work addresses the risk of overconfidence in AI-assisted medical imaging for clinicians, though it is incremental as it builds on existing DenseNet backbones with added rejection strategies.
The study tackled the problem of overconfidence in deep learning models for multi-label chest X-ray classification by introducing an uncertainty-aware framework with rejection mechanisms, resulting in improved diagnostic accuracy and coverage, with entropy-based rejection achieving the highest average AUC across all pathologies on three large public datasets.
Overconfidence in deep learning models poses a significant risk in high-stakes medical imaging tasks, particularly in multi-label classification of chest X-rays, where multiple co-occurring pathologies must be detected simultaneously. This study introduces an uncertainty-aware framework for chest X-ray diagnosis based on a DenseNet-121 backbone, enhanced with two selective prediction mechanisms: entropy-based rejection and confidence interval-based rejection. Both methods enable the model to abstain from uncertain predictions, improving reliability by deferring ambiguous cases to clinical experts. A quantile-based calibration procedure is employed to tune rejection thresholds using either global or class-specific strategies. Experiments conducted on three large public datasets (PadChest, NIH ChestX-ray14, and MIMIC-CXR) demonstrate that selective rejection improves the trade-off between diagnostic accuracy and coverage, with entropy-based rejection yielding the highest average AUC across all pathologies. These results support the integration of selective prediction into AI-assisted diagnostic workflows, providing a practical step toward safer, uncertainty-aware deployment of deep learning in clinical settings.