CVAug 20, 2024

ViLReF: An Expert Knowledge Enabled Vision-Language Retinal Foundation Model

arXiv:2408.10894v49 citationsh-index: 26Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of improving retinal disease diagnosis through AI for medical professionals, but it is incremental as it builds on existing vision-language pre-training methods with domain-specific adaptations.

The authors tackled the challenge of subtle semantic differences and false negative samples in retinal image-text pairs by developing ViLReF, a vision-language retinal foundation model pre-trained on 451,956 image-report pairs, which demonstrated powerful zero-shot and transfer learning capabilities in downstream classification and segmentation tasks.

Subtle semantic differences in retinal image and text data present great challenges for pre-training visual-language models. Moreover, false negative samples, i.e., image-text pairs having the same semantics but incorrectly regarded as negatives, disrupt the visual-language pre-training process and affect the model's learning ability. This work aims to develop a retinal foundation model, called ViLReF, by pre-training on a paired dataset comprising 451,956 retinal images and corresponding diagnostic text reports. In our vision-language pre-training strategy, we leverage expert knowledge to facilitate the extraction of labels and propose a novel constraint, the Weighted Similarity Coupling Loss, to adjust the speed of pushing sample pairs further apart dynamically within the feature space. Furthermore, we employ a batch expansion module with dynamic memory queues, maintained by momentum encoders, to supply extra samples and compensate for the vacancies caused by eliminating false negatives. Extensive experiments are conducted on multiple datasets for downstream classification and segmentation tasks. The experimental results demonstrate the powerful zero-shot and transfer learning capabilities of ViLReF, verifying the effectiveness of our pre-training strategy. Our ViLReF model is available at: https://github.com/T6Yang/ViLReF.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes