IVCVJul 25, 2025

Querying GI Endoscopy Images: A VQA Approach

arXiv:2507.21165v15.11 citationsh-index: 1CLEF
Originality Synthesis-oriented
AI Analysis

This work addresses the need for accurate and efficient AI-assisted diagnosis of GI diseases for clinicians, but it is incremental as it adapts an existing model to a specific domain.

The study tackled the problem of poor VQA performance in medical imaging by adapting the Florence2 model for GI endoscopy images, achieving results evaluated with metrics like ROUGE, BLEU, and METEOR.

VQA (Visual Question Answering) combines Natural Language Processing (NLP) with image understanding to answer questions about a given image. It has enormous potential for the development of medical diagnostic AI systems. Such a system can help clinicians diagnose gastro-intestinal (GI) diseases accurately and efficiently. Although many of the multimodal LLMs available today have excellent VQA capabilities in the general domain, they perform very poorly for VQA tasks in specialized domains such as medical imaging. This study is a submission for ImageCLEFmed-MEDVQA-GI 2025 subtask 1 that explores the adaptation of the Florence2 model to answer medical visual questions on GI endoscopy images. We also evaluate the model performance using standard metrics like ROUGE, BLEU and METEOR

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes