CVAIMay 18, 2024

EyeFound: A Multimodal Generalist Foundation Model for Ophthalmic Imaging

arXiv:2405.11338v237 citationsh-index: 54
Originality Incremental advance
AI Analysis

This work addresses the need for versatile AI models in ophthalmology to reduce annotation burdens and improve performance across diverse tasks, though it is incremental as it builds on existing foundation model concepts.

The paper tackles the problem of limited clinical utility in ophthalmology AI by developing EyeFound, a multimodal foundation model that learns from unlabeled retinal images across 11 modalities, enabling efficient adaptation and outperforming previous models like RETFound in diagnosing eye diseases, predicting systemic diseases, and zero-shot multimodal visual question answering.

Artificial intelligence (AI) is vital in ophthalmology, tackling tasks like diagnosis, classification, and visual question answering (VQA). However, existing AI models in this domain often require extensive annotation and are task-specific, limiting their clinical utility. While recent developments have brought about foundation models for ophthalmology, they are limited by the need to train separate weights for each imaging modality, preventing a comprehensive representation of multi-modal features. This highlights the need for versatile foundation models capable of handling various tasks and modalities in ophthalmology. To address this gap, we present EyeFound, a multimodal foundation model for ophthalmic images. Unlike existing models, EyeFound learns generalizable representations from unlabeled multimodal retinal images, enabling efficient model adaptation across multiple applications. Trained on 2.78 million images from 227 hospitals across 11 ophthalmic modalities, EyeFound facilitates generalist representations and diverse multimodal downstream tasks, even for detecting challenging rare diseases. It outperforms previous work RETFound in diagnosing eye diseases, predicting systemic disease incidents, and zero-shot multimodal VQA. EyeFound provides a generalizable solution to improve model performance and lessen the annotation burden on experts, facilitating widespread clinical AI applications for retinal imaging.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes