IVAICVNov 1, 2024

Cross-Fundus Transformer for Multi-modal Diabetic Retinopathy Grading with Cataract

arXiv:2411.00726v13 citationsh-index: 24Has Code
Originality Incremental advance
AI Analysis

This addresses more accurate grading of diabetic retinopathy, a leading cause of blindness, for medical diagnosis, though it is incremental as it builds on existing multi-modal and transformer methods.

The paper tackles diabetic retinopathy grading by fusing color and infrared fundus images using a novel multi-modal deep learning framework, achieving superior performance on a clinical dataset of 1,713 image pairs.

Diabetic retinopathy (DR) is a leading cause of blindness worldwide and a common complication of diabetes. As two different imaging tools for DR grading, color fundus photography (CFP) and infrared fundus photography (IFP) are highly-correlated and complementary in clinical applications. To the best of our knowledge, this is the first study that explores a novel multi-modal deep learning framework to fuse the information from CFP and IFP towards more accurate DR grading. Specifically, we construct a dual-stream architecture Cross-Fundus Transformer (CFT) to fuse the ViT-based features of two fundus image modalities. In particular, a meticulously engineered Cross-Fundus Attention (CFA) module is introduced to capture the correspondence between CFP and IFP images. Moreover, we adopt both the single-modality and multi-modality supervisions to maximize the overall performance for DR grading. Extensive experiments on a clinical dataset consisting of 1,713 pairs of multi-modal fundus images demonstrate the superiority of our proposed method. Our code will be released for public access.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes