CV LGFeb 16

Uncertainty-Aware Vision-Language Segmentation for Medical Imaging

Aryan Das, Tanishq Rachamalla, Koushik Biswas, Swalpa Kumar Roy, Vinay Kumar Verma

arXiv:2602.14498v11.5h-index: 19Has Code

Originality Incremental advance

AI Analysis

This work addresses precise medical diagnosis in complex clinical circumstances with poor image quality, representing an incremental improvement by integrating uncertainty modelling and modality alignment.

The authors tackled the problem of medical image segmentation under ambiguity by introducing an uncertainty-aware multimodal framework that leverages both radiological images and clinical text, achieving superior segmentation performance and computational efficiency on datasets like QATA-COVID19, MosMed++, and Kvasir-SEG.

We introduce a novel uncertainty-aware multimodal segmentation framework that leverages both radiological images and associated clinical text for precise medical diagnosis. We propose a Modality Decoding Attention Block (MoDAB) with a lightweight State Space Mixer (SSMix) to enable efficient cross-modal fusion and long-range dependency modelling. To guide learning under ambiguity, we propose the Spectral-Entropic Uncertainty (SEU) Loss, which jointly captures spatial overlap, spectral consistency, and predictive uncertainty in a unified objective. In complex clinical circumstances with poor image quality, this formulation improves model reliability. Extensive experiments on various publicly available medical datasets, QATA-COVID19, MosMed++, and Kvasir-SEG, demonstrate that our method achieves superior segmentation performance while being significantly more computationally efficient than existing State-of-the-Art (SoTA) approaches. Our results highlight the importance of incorporating uncertainty modelling and structured modality alignment in vision-language medical segmentation tasks. Code: https://github.com/arya-domain/UA-VLS

View on arXiv PDF Code

Similar