CVAIIRApr 27

Retrieval-Guided Generation for Safer Histopathology Image Captioning

arXiv:2605.0089321.1h-index: 11
AI Analysis

For medical image captioning, RGG offers a safer, more transparent alternative to generative models, reducing hallucination and factual inconsistency.

Retrieval-guided generation (RGG) improves semantic alignment in histopathology image captioning, achieving cosine similarity ~0.60 vs ~0.47 for MedGemma, with fewer unsupported diagnoses.

Generative vision-language models can produce fluent medical image captions but remain prone to hallucination, over-specific diagnostic claims, and factual inconsistency-serious issues in pathology. We investigate retrieval-guided generation (RGG) as a safer alternative, where captions are formed by summarizing expert text from visually similar cases rather than generated de novo. On the ARCH histopathology dataset, RGG improves semantic alignment with ground truth, achieving cosine similarity of $\approx$0.60 versus $\approx$0.47 from MedGemma, with non-overlapping confidence intervals indicating a robust gain. A pathologist-led qualitative review shows better preservation of morphology-relevant terminology and fewer unsupported diagnoses, while revealing failure modes such as concept mixing and inherited over-specific labeling. Overall, retrieval-guided captioning offers a more transparent and reliable approach with clearer opportunities for auditing than fully generative methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes