CV AIDec 23, 2024

FFA Sora, video generation as fundus fluorescein angiography simulator

Xinyuan Wu, Lili Wang, Ruoyu Chen, Bowen Liu, Weiyi Zhang, Xi Yang, Yifan Feng, Mingguang He, Danli Shi

arXiv:2412.17346v13.72 citationsh-index: 15

Originality Incremental advance

AI Analysis

This addresses privacy concerns in sharing large-scale FFA data and enhances medical education for beginners in retinal disease diagnosis.

The study tackled the challenge of interpreting fundus fluorescein angiography (FFA) images for beginners by developing FFA Sora, a text-to-video model that converts FFA reports into dynamic videos, achieving metrics such as FVD = 329.78, LPIPS = 0.48, and VQAScore = 0.61.

Fundus fluorescein angiography (FFA) is critical for diagnosing retinal vascular diseases, but beginners often struggle with image interpretation. This study develops FFA Sora, a text-to-video model that converts FFA reports into dynamic videos via a Wavelet-Flow Variational Autoencoder (WF-VAE) and a diffusion transformer (DiT). Trained on an anonymized dataset, FFA Sora accurately simulates disease features from the input text, as confirmed by objective metrics: Frechet Video Distance (FVD) = 329.78, Learned Perceptual Image Patch Similarity (LPIPS) = 0.48, and Visual-question-answering Score (VQAScore) = 0.61. Specific evaluations showed acceptable alignment between the generated videos and textual prompts, with BERTScore of 0.35. Additionally, the model demonstrated strong privacy-preserving performance in retrieval evaluations, achieving an average Recall@K of 0.073. Human assessments indicated satisfactory visual quality, with an average score of 1.570(scale: 1 = best, 5 = worst). This model addresses privacy concerns associated with sharing large-scale FFA data and enhances medical education.

View on arXiv PDF

Similar