CVCLJul 31, 2025

On the Risk of Misleading Reports: Diagnosing Textual Biases in Multimodal Clinical AI

arXiv:2508.00171v11 citationsh-index: 22Has CodeAgentic AI/CREATE/Clinical MLLMs@MICCAI
Originality Incremental advance
AI Analysis

This addresses the risk of misleading clinical decisions due to biased AI models, which is an incremental improvement in diagnostic evaluation methods.

The paper tackled the problem of multimodal clinical AI models exhibiting biases towards textual information over visual cues, and introduced Selective Modality Shifting (SMS) to quantify this reliance, revealing a marked dependency on text input across six VLMs on medical datasets.

Clinical decision-making relies on the integrated analysis of medical images and the associated clinical reports. While Vision-Language Models (VLMs) can offer a unified framework for such tasks, they can exhibit strong biases toward one modality, frequently overlooking critical visual cues in favor of textual information. In this work, we introduce Selective Modality Shifting (SMS), a perturbation-based approach to quantify a model's reliance on each modality in binary classification tasks. By systematically swapping images or text between samples with opposing labels, we expose modality-specific biases. We assess six open-source VLMs-four generalist models and two fine-tuned for medical data-on two medical imaging datasets with distinct modalities: MIMIC-CXR (chest X-ray) and FairVLMed (scanning laser ophthalmoscopy). By assessing model performance and the calibration of every model in both unperturbed and perturbed settings, we reveal a marked dependency on text input, which persists despite the presence of complementary visual information. We also perform a qualitative attention-based analysis which further confirms that image content is often overshadowed by text details. Our findings highlight the importance of designing and evaluating multimodal medical models that genuinely integrate visual and textual cues, rather than relying on single-modality signals.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes