CVMay 31

Ask4VG: Risk-Aware Question Selection for Reducing Prior-Driven Answers in Medical VQA

arXiv:2606.0104454.1
Predicted impact top 65% in CV · last 90 daysOriginality Incremental advance
AI Analysis

For medical VQA practitioners, this work addresses the problem of hallucinated answers due to prior-driven shortcuts, offering a complementary method to response-level mitigation.

Medical VQA models often rely on question-answer shortcuts rather than image evidence, leading to hallucinated answers. Ask4VG reduces this risk by selecting questions less invariant to missing visual evidence, achieving a risk reduction from 0.658 to 0.623 and accuracy improvement from 0.337 to 0.356 on VQA-RAD.

Medical visual question answering requires models to ground their responses in image evidence, because visually unsupported answers can mislead downstream interpretation. However, many medical VQA questions are generic, template-like, or highly similar in form, which can encourage models to learn question-answer shortcuts instead of image-dependent reasoning and thereby increase the risk of hallucinated responses. We propose Ask4VG, a label-free pilot framework for risk-aware question selection. Ask4VG estimates question-induced hallucination risk through counterfactual visual probing: the same question is asked under the original image, a perturbed image, a blank image, and a mismatched image, and the resulting answer relations are converted into weak supervision for a counterfactual risk estimator. The learned estimator then reranks candidate question rewrites to favor intent-preserving questions that are less invariant to missing or mismatched visual evidence before final answer generation. On VQA-RAD with Qwen2-VL-2B-Instruct, prompt-only rewriting increases counterfactual risk, whereas predicted-risk reranking reduces held-out risk from 0.658 to 0.623 and improves exact accuracy from 0.337 to 0.356. A 300-sample PMC-VQA external check shows the same direction of risk reduction with a small accuracy gain. These results suggest that question selection is a promising complement to response-level hallucination mitigation for reliable medical VQA.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes