IVAICVFeb 10, 2025

Is an Ultra Large Natural Image-Based Foundation Model Superior to a Retina-Specific Model for Detecting Ocular and Systemic Diseases?

arXiv:2502.06289v22 citationsh-index: 54Has CodeOphthalmology Science
Originality Highly original
AI Analysis

This research tackles the problem of selecting the most suitable foundation model for ophthalmology tasks, which is significant for clinicians and researchers in the field who need to optimize clinical performance.

The study compared the performance of a general-purpose foundation model (DINOv2) and a retina-specific model (RETFound) in detecting ocular and systemic diseases, with DINOv2 outperforming RETFound in detecting diabetic retinopathy and multi-class eye diseases, but RETFound performing better in predicting heart failure, myocardial infarction, and ischaemic stroke, with AUROC values ranging from 0.850-0.952 for DINOv2 and 0.823-0.944 for RETFound.

The advent of foundation models (FMs) is transforming medical domain. In ophthalmology, RETFound, a retina-specific FM pre-trained sequentially on 1.4 million natural images and 1.6 million retinal images, has demonstrated high adaptability across clinical applications. Conversely, DINOv2, a general-purpose vision FM pre-trained on 142 million natural images, has shown promise in non-medical domains. However, its applicability to clinical tasks remains underexplored. To address this, we conducted head-to-head evaluations by fine-tuning RETFound and three DINOv2 models (large, base, small) for ocular disease detection and systemic disease prediction tasks, across eight standardized open-source ocular datasets, as well as the Moorfields AlzEye and the UK Biobank datasets. DINOv2-large model outperformed RETFound in detecting diabetic retinopathy (AUROC=0.850-0.952 vs 0.823-0.944, across three datasets, all P<=0.007) and multi-class eye diseases (AUROC=0.892 vs. 0.846, P<0.001). In glaucoma, DINOv2-base model outperformed RETFound (AUROC=0.958 vs 0.940, P<0.001). Conversely, RETFound achieved superior performance over all DINOv2 models in predicting heart failure, myocardial infarction, and ischaemic stroke (AUROC=0.732-0.796 vs 0.663-0.771, all P<0.001). These trends persisted even with 10% of the fine-tuning data. These findings showcase the distinct scenarios where general-purpose and domain-specific FMs excel, highlighting the importance of aligning FM selection with task-specific requirements to optimise clinical performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes