CVOct 8, 2025

Evaluating Fundus-Specific Foundation Models for Diabetic Macular Edema Detection

Franco Javier Arellano, José Ignacio Orlando

arXiv:2510.07277v13.6h-index: 22SIPAIM

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of automated DME detection for ophthalmology, but it is incremental as it evaluates existing methods without introducing new techniques.

The paper tackled the problem of detecting Diabetic Macular Edema (DME) from fundus images by comparing foundation models (RETFound and FLAIR) with a standard CNN (EfficientNet-B0) across multiple datasets, finding that the CNN often outperformed the foundation models, with EfficientNet-B0 ranking first or second in most settings and FLAIR showing competitive zero-shot performance only in specific cases.

Diabetic Macular Edema (DME) is a leading cause of vision loss among patients with Diabetic Retinopathy (DR). While deep learning has shown promising results for automatically detecting this condition from fundus images, its application remains challenging due the limited availability of annotated data. Foundation Models (FM) have emerged as an alternative solution. However, it is unclear if they can cope with DME detection in particular. In this paper, we systematically compare different FM and standard transfer learning approaches for this task. Specifically, we compare the two most popular FM for retinal images--RETFound and FLAIR--and an EfficientNet-B0 backbone, across different training regimes and evaluation settings in IDRiD, MESSIDOR-2 and OCT-and-Eye-Fundus-Images (OEFI). Results show that despite their scale, FM do not consistently outperform fine-tuned CNNs in this task. In particular, an EfficientNet-B0 ranked first or second in terms of area under the ROC and precision/recall curves in most evaluation settings, with RETFound only showing promising results in OEFI. FLAIR, on the other hand, demonstrated competitive zero-shot performance, achieving notable AUC-PR scores when prompted appropriately. These findings reveal that FM might not be a good tool for fine-grained ophthalmic tasks such as DME detection even after fine-tuning, suggesting that lightweight CNNs remain strong baselines in data-scarce environments.

View on arXiv PDF

Similar