AI CL CVFeb 28, 2025

MedHallTune: An Instruction-Tuning Benchmark for Mitigating Medical Hallucination in Vision-Language Models

Qiao Yan, Yuchen Yuan, Xiaowei Hu, Yihan Wang, Jiaqi Xu, Jinpeng Li, Chi-Wing Fu, Pheng-Ann Heng

arXiv:2502.20780v14 citationsh-index: 29Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the critical issue of unreliable model outputs in medical applications, which can harm clinical decision-making, by providing a benchmark for evaluation and mitigation, though it is incremental as it builds on existing VLM methods.

The paper tackles the problem of hallucinations in vision-language models (VLMs) used in healthcare by introducing MedHallTune, a large-scale benchmark with over 100,000 images and 1,000,000 instruction pairs, and shows that fine-tuning with it improves models' ability to manage hallucinations and boosts zero-shot performance on downstream tasks.

The increasing use of vision-language models (VLMs) in healthcare applications presents great challenges related to hallucinations, in which the models may generate seemingly plausible results that are in fact incorrect. Such hallucinations can jeopardize clinical decision making, potentially harming the diagnosis and treatments. In this work, we propose MedHallTune, a large-scale benchmark designed specifically to evaluate and mitigate hallucinations in medical VLMs. Comprising over 100,000 images and 1,000,000 instruction pairs, MedHallTune includes both hallucination and non-hallucination samples, each with ground-truth annotations. We conduct a comprehensive evaluation of current medical and general VLMs using MedHallTune, assessing their performance across key metrics, including clinical accuracy, relevance, detail level, and risk level. The experimental results show that fine-tuning with MedHallTune successfully improves the ability of several existing models to manage hallucinations and boost their zero-shot performance on downstream visual-question-answering (VQA) tasks, making them more reliable for practical medical applications. Our work contributes to the development of more trustworthy VLMs. Codes and dataset will be available at \href{https://github.com/russellyq/MedHallTune}{MedHallTune}.

View on arXiv PDF Code

Similar