IV CVJun 21, 2024

Benchmarking Retinal Blood Vessel Segmentation Models for Cross-Dataset and Cross-Disease Generalization

Jeremiah Fadugba, Patrick Köhler, Lisa Koch, Petru Manescu, Philipp Berens

arXiv:2406.14994v18.52 citationsh-index: 46Has Code

Originality Synthesis-oriented

AI Analysis

This provides practical guidance for clinical settings on model selection and dataset curation, but it is incremental as it benchmarks existing methods on new data.

The study benchmarked retinal blood vessel segmentation models using the largest dataset to date, finding that basic architectures like U-Net perform as well as advanced ones with sufficient data, and image quality is a key factor affecting segmentation outcomes.

Retinal blood vessel segmentation can extract clinically relevant information from fundus images. As manual tracing is cumbersome, algorithms based on Convolution Neural Networks have been developed. Such studies have used small publicly available datasets for training and measuring performance, running the risk of overfitting. Here, we provide a rigorous benchmark for various architectural and training choices commonly used in the literature on the largest dataset published to date. We train and evaluate five published models on the publicly available FIVES fundus image dataset, which exceeds previous ones in size and quality and which contains also images from common ophthalmological conditions (diabetic retinopathy, age-related macular degeneration, glaucoma). We compare the performance of different model architectures across different loss functions, levels of image qualitiy and ophthalmological conditions and assess their ability to perform well in the face of disease-induced domain shifts. Given sufficient training data, basic architectures such as U-Net perform just as well as more advanced ones, and transfer across disease-induced domain shifts typically works well for most architectures. However, we find that image quality is a key factor determining segmentation outcomes. When optimizing for segmentation performance, investing into a well curated dataset to train a standard architecture yields better results than tuning a sophisticated architecture on a smaller dataset or one with lower image quality. We distilled the utility of architectural advances in terms of their clinical relevance therefore providing practical guidance for model choices depending on the circumstances of the clinical setting

View on arXiv PDF Code

Similar