IVCVMay 13, 2025

An integrated language-vision foundation model for conversational diagnostics and triaging in primary eye care

arXiv:2505.08414v14 citationsh-index: 132Cell Rep Med
Originality Incremental advance
AI Analysis

This provides a valuable decision support tool for primary eye care by enhancing diagnostic performance and usability, though it is incremental as it builds on existing foundation models with fine-tuning.

The authors tackled the problem of task-specific deep learning models lacking user-friendly interfaces in eye care by developing Meta-EyeFM, an integrated language-vision foundation model that achieved 100% accuracy in routing fundus images and ≥82.2% accuracy in disease detection, outperforming other models by 11% to 43% and matching an ophthalmologist's performance.

Current deep learning models are mostly task specific and lack a user-friendly interface to operate. We present Meta-EyeFM, a multi-function foundation model that integrates a large language model (LLM) with vision foundation models (VFMs) for ocular disease assessment. Meta-EyeFM leverages a routing mechanism to enable accurate task-specific analysis based on text queries. Using Low Rank Adaptation, we fine-tuned our VFMs to detect ocular and systemic diseases, differentiate ocular disease severity, and identify common ocular signs. The model achieved 100% accuracy in routing fundus images to appropriate VFMs, which achieved $\ge$ 82.2% accuracy in disease detection, $\ge$ 89% in severity differentiation, $\ge$ 76% in sign identification. Meta-EyeFM was 11% to 43% more accurate than Gemini-1.5-flash and ChatGPT-4o LMMs in detecting various eye diseases and comparable to an ophthalmologist. This system offers enhanced usability and diagnostic performance, making it a valuable decision support tool for primary eye care or an online LLM for fundus evaluation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes