An integrated language-vision foundation model for conversational diagnostics and triaging in primary eye care
This provides a valuable decision support tool for primary eye care by enhancing diagnostic performance and usability, though it is incremental as it builds on existing foundation models with fine-tuning.
The authors tackled the problem of task-specific deep learning models lacking user-friendly interfaces in eye care by developing Meta-EyeFM, an integrated language-vision foundation model that achieved 100% accuracy in routing fundus images and ≥82.2% accuracy in disease detection, outperforming other models by 11% to 43% and matching an ophthalmologist's performance.
Current deep learning models are mostly task specific and lack a user-friendly interface to operate. We present Meta-EyeFM, a multi-function foundation model that integrates a large language model (LLM) with vision foundation models (VFMs) for ocular disease assessment. Meta-EyeFM leverages a routing mechanism to enable accurate task-specific analysis based on text queries. Using Low Rank Adaptation, we fine-tuned our VFMs to detect ocular and systemic diseases, differentiate ocular disease severity, and identify common ocular signs. The model achieved 100% accuracy in routing fundus images to appropriate VFMs, which achieved $\ge$ 82.2% accuracy in disease detection, $\ge$ 89% in severity differentiation, $\ge$ 76% in sign identification. Meta-EyeFM was 11% to 43% more accurate than Gemini-1.5-flash and ChatGPT-4o LMMs in detecting various eye diseases and comparable to an ophthalmologist. This system offers enhanced usability and diagnostic performance, making it a valuable decision support tool for primary eye care or an online LLM for fundus evaluation.