CVOct 18, 2018

Deep Learning vs. Human Graders for Classifying Severity Levels of Diabetic Retinopathy in a Real-World Nationwide Screening Program

Paisan Raumviboonsuk, Jonathan Krause, Peranut Chotcomwongse, Rory Sayres, Rajiv Raman, Kasumi Widner, Bilson J L Campana, Sonia Phene, Kornwipa Hemarat, Mongkol Tadarati, Sukhum Silpa-Acha, Jirawut Limwattanayingyong

arXiv:1810.08290v11.714 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of scalable and accurate diabetic retinopathy screening for patients in real-world clinical settings, though it is incremental as it applies an existing method to new data.

The study validated a deep learning algorithm for classifying diabetic retinopathy severity on 25,326 retinal images from a nationwide screening program in Thailand, finding it significantly reduced false negative rates by 23% while slightly increasing false positive rates by 2% compared to human graders.

Deep learning algorithms have been used to detect diabetic retinopathy (DR) with specialist-level accuracy. This study aims to validate one such algorithm on a large-scale clinical population, and compare the algorithm performance with that of human graders. 25,326 gradable retinal images of patients with diabetes from the community-based, nation-wide screening program of DR in Thailand were analyzed for DR severity and referable diabetic macular edema (DME). Grades adjudicated by a panel of international retinal specialists served as the reference standard. Across different severity levels of DR for determining referable disease, deep learning significantly reduced the false negative rate (by 23%) at the cost of slightly higher false positive rates (2%). Deep learning algorithms may serve as a valuable tool for DR screening.

View on arXiv PDF

Similar