IV CVMay 13, 2025

An integrated language-vision foundation model for conversational diagnostics and triaging in primary eye care

Zhi Da Soh, Yang Bai, Kai Yu, Yang Zhou, Xiaofeng Lei, Sahil Thakur, Zann Lee, Lee Ching Linette Phang, Qingsheng Peng, Can Can Xue, Rachel Shujuan Chong, Quan V. Hoang

arXiv:2505.08414v14 citationsh-index: 132Cell Rep Med

Originality Incremental advance

AI Analysis

This provides a valuable decision support tool for primary eye care by enhancing diagnostic performance and usability, though it is incremental as it builds on existing foundation models with fine-tuning.

The authors tackled the problem of task-specific deep learning models lacking user-friendly interfaces in eye care by developing Meta-EyeFM, an integrated language-vision foundation model that achieved 100% accuracy in routing fundus images and ≥82.2% accuracy in disease detection, outperforming other models by 11% to 43% and matching an ophthalmologist's performance.

Current deep learning models are mostly task specific and lack a user-friendly interface to operate. We present Meta-EyeFM, a multi-function foundation model that integrates a large language model (LLM) with vision foundation models (VFMs) for ocular disease assessment. Meta-EyeFM leverages a routing mechanism to enable accurate task-specific analysis based on text queries. Using Low Rank Adaptation, we fine-tuned our VFMs to detect ocular and systemic diseases, differentiate ocular disease severity, and identify common ocular signs. The model achieved 100% accuracy in routing fundus images to appropriate VFMs, which achieved $\ge$ 82.2% accuracy in disease detection, $\ge$ 89% in severity differentiation, $\ge$ 76% in sign identification. Meta-EyeFM was 11% to 43% more accurate than Gemini-1.5-flash and ChatGPT-4o LMMs in detecting various eye diseases and comparable to an ophthalmologist. This system offers enhanced usability and diagnostic performance, making it a valuable decision support tool for primary eye care or an online LLM for fundus evaluation.

View on arXiv PDF

Similar