CVAICLLGJul 31, 2024

Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering

arXiv:2407.21368v311 citationsh-index: 18
Originality Incremental advance
AI Analysis

This work addresses diagnostic accuracy issues in medical AI for clinicians, though it is incremental as it builds on existing MLVLM methods with prompting enhancements.

The paper tackled the problem of hallucination and imbalanced data in Medical Large Vision-Language Models (MLVLMs) for diagnosing pathologies via Visual Question Answering, proposing two prompting strategies that improved diagnostic F1 scores by up to 0.27 and Recall by about 0.07 on datasets like MIMIC-CXR-JPG and Chexpert.

Large Vision-Language Models (LVLMs) have achieved significant success in recent years, and they have been extended to the medical domain. Although demonstrating satisfactory performance on medical Visual Question Answering (VQA) tasks, Medical LVLMs (MLVLMs) suffer from the hallucination problem, which makes them fail to diagnose complex pathologies. Moreover, they readily fail to learn minority pathologies due to imbalanced training data. We propose two prompting strategies for MLVLMs that reduce hallucination and improve VQA performance. In the first strategy, we provide a detailed explanation of the queried pathology. In the second strategy, we fine-tune a cheap, weak learner to achieve high performance on a specific metric, and textually provide its judgment to the MLVLM. Tested on the MIMIC-CXR-JPG and Chexpert datasets, our methods significantly improve the diagnostic F1 score, with the highest increase being 0.27. We also demonstrate that our prompting strategies can be extended to general LVLM domains. Based on POPE metrics, it effectively suppresses the false negative predictions of existing LVLMs and improves Recall by approximately 0.07.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes