CVMay 8, 2025

CAG-VLM: Fine-Tuning of a Large-Scale Model to Recognize Angiographic Images for Next-Generation Diagnostic Systems

Yuto Nakamura, Satoshi Kodera, Haruki Settai, Hiroki Shinohara, Masatsugu Tamura, Tomohiro Noguchi, Tatsuki Furusawa, Ryo Takizawa, Tempei Kabayama, Norihiko Takeda

arXiv:2505.04964v23.6h-index: 9Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of reducing reliance on expert cardiologists for interpreting CAG images, though it is incremental as it builds on existing VLMs with fine-tuning.

The researchers tackled the challenge of automating interpretation of coronary angiography (CAG) images for AI-based decision support by fine-tuning vision-language models (VLMs) on a bilingual dataset, resulting in a model (CAG-VLM) that achieved a top clinician rating of 7.20/10 for generating clinical reports and treatment recommendations.

Coronary angiography (CAG) is the gold-standard imaging modality for evaluating coronary artery disease, but its interpretation and subsequent treatment planning rely heavily on expert cardiologists. To enable AI-based decision support, we introduce a two-stage, physician-curated pipeline and a bilingual (Japanese/English) CAG image-report dataset. First, we sample 14,686 frames from 539 exams and annotate them for key-frame detection and left/right laterality; a ConvNeXt-Base CNN trained on this data achieves 0.96 F1 on laterality classification, even on low-contrast frames. Second, we apply the CNN to 243 independent exams, extract 1,114 key frames, and pair each with its pre-procedure report and expert-validated diagnostic and treatment summary, yielding a parallel corpus. We then fine-tune three open-source VLMs (PaliGemma2, Gemma3, and ConceptCLIP-enhanced Gemma3) via LoRA and evaluate them using VLScore and cardiologist review. Although PaliGemma2 w/LoRA attains the highest VLScore, Gemma3 w/LoRA achieves the top clinician rating (mean 7.20/10); we designate this best-performing model as CAG-VLM. These results demonstrate that specialized, fine-tuned VLMs can effectively assist cardiologists in generating clinical reports and treatment recommendations from CAG images.

View on arXiv PDF

Similar