CVDec 11, 2025

Synthetic Vasculature and Pathology Enhance Vision-Language Model Reasoning

arXiv:2512.11060v1
Originality Incremental advance
AI Analysis

This addresses the problem of limited interpretable medical diagnosis tools for clinicians in ophthalmology by providing a scalable synthetic data solution, though it is incremental as it builds on existing VLM methods.

The paper tackled the scarcity of large-scale image-text datasets for training Vision-Language Models (VLMs) in specialized medical domains like OCTA imaging by introducing Synthetic Vasculature Reasoning (SVR), a framework that synthesizes realistic retinal vasculature images with diabetic retinopathy features and corresponding granular reasoning texts, resulting in a VLM achieving 89.67% zero-shot balanced classification accuracy on real OCTA images and improved explanation quality.

Vision-Language Models (VLMs) offer a promising path toward interpretable medical diagnosis by allowing users to ask about clinical explanations alongside predictions and across different modalities. However, training VLMs for detailed reasoning requires large-scale image-text datasets. In many specialized domains, for example in reading Optical Coherence Tomography Angiography (OCTA) images, such precise text with grounded description of pathologies is scarce or even non-existent. To overcome this bottleneck, we introduce Synthetic Vasculature Reasoning (SVR), a framework that controllably synthesizes images and corresponding text, specifically: realistic retinal vasculature with Diabetic Retinopathy (DR) features: capillary dropout, microaneurysms, neovascularization, and tortuosity, while automatically generating granular reasoning texts. Based on this we curate OCTA-100K-SVR, an OCTA image-reasoning dataset with 100,000 pairs. Our experiments show that a general-purpose VLM (Qwen3-VL-8b) trained on the dataset achieves a zero-shot balanced classification accuracy of 89.67% on real OCTA images, outperforming supervised baselines. Through human expert evaluation we also demonstrate that it significantly enhances explanation quality and pathology localization on clinical data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes