CVMay 22, 2025

MedCFVQA: A Causal Approach to Mitigate Modality Preference Bias in Medical Visual Question Answering

arXiv:2505.16209v23 citationsh-index: 25Proceedings of the First International Workshop on Vision-Language Models for Biomedical Applications
Originality Incremental advance
AI Analysis

This addresses a critical bias issue in MedVQA for clinical diagnosis, though it is an incremental improvement over existing causal methods applied to a specific domain.

The paper tackled modality preference bias in Medical Visual Question Answering (MedVQA), where models overly rely on questions and ignore images, by proposing MedCFVQA, a causal approach that uses counterfactual training and dataset reconstruction to mitigate bias, achieving significant performance improvements over non-causal methods on datasets like SLAKE and RadVQA.

Medical Visual Question Answering (MedVQA) is crucial for enhancing the efficiency of clinical diagnosis by providing accurate and timely responses to clinicians' inquiries regarding medical images. Existing MedVQA models suffered from modality preference bias, where predictions are heavily dominated by one modality while overlooking the other (in MedVQA, usually questions dominate the answer but images are overlooked), thereby failing to learn multimodal knowledge. To overcome the modality preference bias, we proposed a Medical CounterFactual VQA (MedCFVQA) model, which trains with bias and leverages causal graphs to eliminate the modality preference bias during inference. Existing MedVQA datasets exhibit substantial prior dependencies between questions and answers, which results in acceptable performance even if the model significantly suffers from the modality preference bias. To address this issue, we reconstructed new datasets by leveraging existing MedVQA datasets and Changed their P3rior dependencies (CP) between questions and their answers in the training and test set. Extensive experiments demonstrate that MedCFVQA significantly outperforms its non-causal counterpart on both SLAKE, RadVQA and SLAKE-CP, RadVQA-CP datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes