CLAICVJan 11, 2024

Hallucination Benchmark in Medical Visual Question Answering

arXiv:2401.05827v224 citationsh-index: 8Tiny Papers @ ICLR
Originality Synthesis-oriented
AI Analysis

This addresses the reliability of AI in healthcare by identifying hallucination issues in medical VQA, though it is incremental as it focuses on benchmarking rather than solving the problem.

The authors tackled the problem of hallucination in medical visual question answering by creating a benchmark and evaluating state-of-the-art models, revealing limitations and the effectiveness of prompting strategies.

The recent success of large language and vision models (LLVMs) on vision question answering (VQA), particularly their applications in medicine (Med-VQA), has shown a great potential of realizing effective visual assistants for healthcare. However, these models are not extensively tested on the hallucination phenomenon in clinical settings. Here, we created a hallucination benchmark of medical images paired with question-answer sets and conducted a comprehensive evaluation of the state-of-the-art models. The study provides an in-depth analysis of current models' limitations and reveals the effectiveness of various prompting strategies.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes