CV AIApr 30, 2024

Training a high-performance retinal foundation model with half-the-data and 400 times less compute

arXiv:2405.00117v29.615 citationsh-index: 8Nat Commun

Originality Highly original

AI Analysis

This addresses the high computational and environmental costs for researchers and practitioners in medical imaging, offering a more accessible and efficient alternative to existing models.

The paper tackles the problem of resource-intensive training and deployment of retinal foundation models in medical AI by proposing RETFound-Green, which uses a novel Token Reconstruction objective to achieve comparable performance with only 75,000 images and 400 times less compute, performing best on 68 out of 119 downstream tasks.

Artificial Intelligence in medicine is traditionally limited by the lack of massive training datasets. Foundation models, pre-trained models that can be adapted to downstream tasks with small datasets, could alleviate this problem. Researchers at Moorfields Eye Hospital (MEH) proposed RETFound-MEH, a retinal foundation model trained on 900,000 images, including private hospital data. Recently, data-efficient DERETFound was proposed providing comparable performance while being trained on only 150,000 publicly available images. However, both these models required very substantial resources to train initially and are resource-intensive in downstream use. We propose a novel Token Reconstruction objective that we use to train RETFound-Green, a retinal foundation model trained using only 75,000 publicly available images and 400 times less compute. We estimate the cost of training RETFound-MEH and DERETFound at \$10,000 and \$14,000, respectively. RETFound-Green could be trained for less than \$100, with equally reduced environmental impact. RETFound-Green is also far more efficient in downstream use: it can be downloaded 14 times faster, computes vector embeddings 2.7 times faster which then require 2.6 times less storage space. Despite this, RETFound-Green does not perform systematically worse. In fact, on various task on three downstream datasets from Brazil, India and China, it performs best on 68 tasks out of 119 comparisons, versus 21 for DERETFound and 13 for RETFound-MEH. Our results suggest that RETFound-Green is a very efficient, high-performance retinal foundation model. We anticipate that our Token Reconstruction objective could be scaled up for even higher performance and be applied to other domains beyond retinal imaging.

View on arXiv PDF

Similar