IVCVLGSep 23, 2024

AI Workflow, External Validation, and Development in Eye Disease Diagnosis

arXiv:2409.15087v211 citationsh-index: 54
Originality Incremental advance
AI Analysis

It addresses the problem of insufficient clinical validation for medical AI, specifically for eye disease diagnosis, by demonstrating real-world improvements in accuracy and efficiency for clinicians.

This study tackled the challenge of applying AI to real-world medical diagnosis by developing an AI-assisted workflow for age-related macular degeneration (AMD) diagnosis and severity classification, showing that AI assistance improved clinicians' average F1-score by 20% (from 37.71 to 45.52) and reduced diagnostic times by up to 40%.

Timely disease diagnosis is challenging due to increasing disease burdens and limited clinician availability. AI shows promise in diagnosis accuracy but faces real-world application issues due to insufficient validation in clinical workflows and diverse populations. This study addresses gaps in medical AI downstream accountability through a case study on age-related macular degeneration (AMD) diagnosis and severity classification. We designed and implemented an AI-assisted diagnostic workflow for AMD, comparing diagnostic performance with and without AI assistance among 24 clinicians from 12 institutions with real patient data sampled from the Age-Related Eye Disease Study (AREDS). Additionally, we demonstrated continual enhancement of an existing AI model by incorporating approximately 40,000 additional medical images (named AREDS2 dataset). The improved model was then systematically evaluated using both AREDS and AREDS2 test sets, as well as an external test set from Singapore. AI assistance markedly enhanced diagnostic accuracy and classification for 23 out of 24 clinicians, with the average F1-score increasing by 20% from 37.71 (Manual) to 45.52 (Manual + AI) (P-value < 0.0001), achieving an improvement of over 50% in some cases. In terms of efficiency, AI assistance reduced diagnostic times for 17 out of the 19 clinicians tracked, with time savings of up to 40%. Furthermore, a model equipped with continual learning showed robust performance across three independent datasets, recording a 29% increase in accuracy, and elevating the F1-score from 42 to 54 in the Singapore population.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes