SA-UNetv2: Rethinking Spatial Attention U-Net for Retinal Vessel Segmentation
This work addresses retinal vessel segmentation for early disease diagnosis, offering an incremental improvement in efficiency for resource-constrained settings.
The paper tackled retinal vessel segmentation by proposing SA-UNetv2, a lightweight model that improves multi-scale feature fusion and handles class imbalance, achieving state-of-the-art performance with 1.2MB memory and 0.26M parameters on DRIVE and STARE datasets.
Retinal vessel segmentation is essential for early diagnosis of diseases such as diabetic retinopathy, hypertension, and neurodegenerative disorders. Although SA-UNet introduces spatial attention in the bottleneck, it underuses attention in skip connections and does not address the severe foreground-background imbalance. We propose SA-UNetv2, a lightweight model that injects cross-scale spatial attention into all skip connections to strengthen multi-scale feature fusion and adopts a weighted Binary Cross-Entropy (BCE) plus Matthews Correlation Coefficient (MCC) loss to improve robustness to class imbalance. On the public DRIVE and STARE datasets, SA-UNetv2 achieves state-of-the-art performance with only 1.2MB memory and 0.26M parameters (less than 50% of SA-UNet), and 1 second CPU inference on 592 x 592 x 3 images, demonstrating strong efficiency and deployability in resource-constrained, CPU-only settings.