Hallucination Benchmark in Medical Visual Question Answering
This addresses the reliability of AI in healthcare by identifying hallucination issues in medical VQA, though it is incremental as it focuses on benchmarking rather than solving the problem.
The authors tackled the problem of hallucination in medical visual question answering by creating a benchmark and evaluating state-of-the-art models, revealing limitations and the effectiveness of prompting strategies.
The recent success of large language and vision models (LLVMs) on vision question answering (VQA), particularly their applications in medicine (Med-VQA), has shown a great potential of realizing effective visual assistants for healthcare. However, these models are not extensively tested on the hallucination phenomenon in clinical settings. Here, we created a hallucination benchmark of medical images paired with question-answer sets and conducted a comprehensive evaluation of the state-of-the-art models. The study provides an in-depth analysis of current models' limitations and reveals the effectiveness of various prompting strategies.