Active Label Correction for Semantic Segmentation with Foundation Models
This work addresses the annotation bottleneck for researchers and practitioners in computer vision by providing a more efficient method for correcting errors in semantic segmentation datasets, though it is incremental as it builds on existing foundation models and active learning techniques.
The paper tackles the problem of labor-intensive pixel-wise annotation for semantic segmentation by proposing an active label correction framework that uses foundation models and superpixels to design annotator-friendly correction queries, resulting in a revised PASCAL dataset with 2.6 million pixels corrected and outperforming prior methods.
Training and validating models for semantic segmentation require datasets with pixel-wise annotations, which are notoriously labor-intensive. Although useful priors such as foundation models or crowdsourced datasets are available, they are error-prone. We hence propose an effective framework of active label correction (ALC) based on a design of correction query to rectify pseudo labels of pixels, which in turn is more annotator-friendly than the standard one inquiring to classify a pixel directly according to our theoretical analysis and user study. Specifically, leveraging foundation models providing useful zero-shot predictions on pseudo labels and superpixels, our method comprises two key techniques: (i) an annotator-friendly design of correction query with the pseudo labels, and (ii) an acquisition function looking ahead label expansions based on the superpixels. Experimental results on PASCAL, Cityscapes, and Kvasir-SEG datasets demonstrate the effectiveness of our ALC framework, outperforming prior methods for active semantic segmentation and label correction. Notably, utilizing our method, we obtained a revised dataset of PASCAL by rectifying errors in 2.6 million pixels in PASCAL dataset.