CV AISep 3, 2025

Lesion-Aware Visual-Language Fusion for Automated Image Captioning of Ulcerative Colitis Endoscopic Examinations

Alexis Ivan Lopez Escamilla, Gilberto Ochoa, Sharib Al

arXiv:2509.03011v16.21 citationsDEMI@MICCAI

Originality Incremental advance

AI Analysis

This work addresses the need for reliable and interpretable automated reporting in clinical gastroenterology, though it is incremental as it builds on existing methods like ResNet and T5.

The paper tackled automated image captioning for ulcerative colitis endoscopic exams by integrating lesion-aware visual features and clinical metadata into a T5 decoder, resulting in improved caption quality and Mayo Endoscopic Score (MES) classification accuracy.

We present a lesion-aware image captioning framework for ulcerative colitis (UC). The model integrates ResNet embeddings, Grad-CAM heatmaps, and CBAM-enhanced attention with a T5 decoder. Clinical metadata (MES score 0-3, vascular pattern, bleeding, erythema, friability, ulceration) is injected as natural-language prompts to guide caption generation. The system produces structured, interpretable descriptions aligned with clinical practice and provides MES classification and lesion tags. Compared with baselines, our approach improves caption quality and MES classification accuracy, supporting reliable endoscopic reporting.

View on arXiv PDF

Similar