DBMF: A Dual-Branch Multimodal Framework for Out-of-Distribution Detection
This addresses reliability issues for clinical AI systems when encountering unseen disease cases, but it is incremental as it builds on existing multimodal approaches.
The paper tackled the problem of out-of-distribution detection in clinical deep learning systems by proposing a dual-branch multimodal framework, achieving up to 24.84% improvement in state-of-the-art performance on endoscopic image datasets.
The complex and dynamic real-world clinical environment demands reliable deep learning (DL) systems. Out-of-distribution (OOD) detection plays a critical role in enhancing the reliability and generalizability of DL models when encountering data that deviate from the training distribution, such as unseen disease cases. However, existing OOD detection methods typically rely either on a single visual modality or solely on image-text matching, failing to fully leverage multimodal information. To overcome the challenge, we propose a novel dual-branch multimodal framework by introducing a text-image branch and a vision branch. Our framework fully exploits multimodal representations to identify OOD samples through these two complementary branches. After training, we compute scores from the text-image branch ($S_t$) and vision branch ($S_v$), and integrate them to obtain the final OOD score $S$ that is compared with a threshold for OOD detection. Comprehensive experiments on publicly available endoscopic image datasets demonstrate that our proposed framework is robust across diverse backbones and improves state-of-the-art performance in OOD detection by up to 24.84%