Data-Centric Benchmark for Label Noise Estimation and Ranking in Remote Sensing Image Segmentation
This work addresses the challenge of annotation errors in remote sensing segmentation, which can degrade model performance, but it is incremental as it builds on existing noise estimation methods.
The paper tackles the problem of label noise in remote sensing image segmentation by introducing a data-centric benchmark with a new dataset and two techniques for identifying and ranking noisy training samples, which outperform established baselines in experiments.
High-quality pixel-level annotations are essential for the semantic segmentation of remote sensing imagery. However, such labels are expensive to obtain and often affected by noise due to the labor-intensive and time-consuming nature of pixel-wise annotation, which makes it challenging for human annotators to label every pixel accurately. Annotation errors can significantly degrade the performance and robustness of modern segmentation models, motivating the need for reliable mechanisms to identify and quantify noisy training samples. This paper introduces a novel Data-Centric benchmark, together with a novel, publicly available dataset and two techniques for identifying, quantifying, and ranking training samples according to their level of label noise in remote sensing semantic segmentation. Such proposed methods leverage complementary strategies based on model uncertainty, prediction consistency, and representation analysis, and consistently outperform established baselines across a range of experimental settings. The outcomes of this work are publicly available at https://github.com/keillernogueira/label_noise_segmentation.