IV CVMay 19, 2025

RetinaLogos: Fine-Grained Synthesis of High-Resolution Retinal Images Through Captions

Junzhi Ning, Cheng Tang, Kaijing Zhou, Diping Song, Lihao Liu, Ming Hu, Wei Li, Huihui Xu, Yanzhou Su, Tianbin Li, Jiyao Liu, Jin Ye

arXiv:2505.12887v315.23 citationsh-index: 17Has CodeMICCAI

Originality Highly original

AI Analysis

This addresses a data bottleneck in ophthalmology for developing machine learning models, offering a novel approach to generate detailed synthetic images beyond coarse disease labels.

The paper tackles the scarcity of high-quality labelled retinal imaging data by introducing RetinaLogos, a method that synthesizes high-resolution retinal images with fine-grained semantic control using captions, achieving 62.07% of synthetic images indistinguishable from real ones by ophthalmologists and improving accuracy by 5%-10% in diabetic retinopathy grading and glaucoma detection.

The scarcity of high-quality, labelled retinal imaging data, which presents a significant challenge in the development of machine learning models for ophthalmology, hinders progress in the field. Existing methods for synthesising Colour Fundus Photographs (CFPs) largely rely on predefined disease labels, which restricts their ability to generate images that reflect fine-grained anatomical variations, subtle disease stages, and diverse pathological features beyond coarse class categories. To overcome these challenges, we first introduce an innovative pipeline that creates a large-scale, captioned retinal dataset comprising 1.4 million entries, called RetinaLogos-1400k. Specifically, RetinaLogos-1400k uses the visual language model(VLM) to describe retinal conditions and key structures, such as optic disc configuration, vascular distribution, nerve fibre layers, and pathological features. Building on this dataset, we employ a novel three-step training framework, RetinaLogos, which enables fine-grained semantic control over retinal images and accurately captures different stages of disease progression, subtle anatomical variations, and specific lesion types. Through extensive experiments, our method demonstrates superior performance across multiple datasets, with 62.07% of text-driven synthetic CFPs indistinguishable from real ones by ophthalmologists. Moreover, the synthetic data improves accuracy by 5%-10% in diabetic retinopathy grading and glaucoma detection. Codes are available at https://github.com/uni-medical/retina-text2cfp.

View on arXiv PDF Code

Similar