CVJan 31, 2022

Modeling the Background for Incremental and Weakly-Supervised Semantic Segmentation

Fabio Cermelli, Massimiliano Mancini, Samuel Rota Buló, Elisa Ricci, Barbara Caputo

arXiv:2201.13338v111 citations

AI Analysis

This work addresses incremental learning and weak supervision challenges in semantic segmentation, offering a novel method that outperforms state-of-the-art, though it is incremental in nature.

The paper tackles catastrophic forgetting and reliance on dense annotations in semantic segmentation by introducing an incremental class learning approach that addresses background semantic shift and extends to weak supervision, achieving significant performance improvements on Pascal-VOC, ADE20K, and Cityscapes datasets.

Deep neural networks have enabled major progresses in semantic segmentation. However, even the most advanced neural architectures suffer from important limitations. First, they are vulnerable to catastrophic forgetting, i.e. they perform poorly when they are required to incrementally update their model as new classes are available. Second, they rely on large amount of pixel-level annotations to produce accurate segmentation maps. To tackle these issues, we introduce a novel incremental class learning approach for semantic segmentation taking into account a peculiar aspect of this task: since each training step provides annotation only for a subset of all possible classes, pixels of the background class exhibit a semantic shift. Therefore, we revisit the traditional distillation paradigm by designing novel loss terms which explicitly account for the background shift. Additionally, we introduce a novel strategy to initialize classifier's parameters at each step in order to prevent biased predictions toward the background class. Finally, we demonstrate that our approach can be extended to point- and scribble-based weakly supervised segmentation, modeling the partial annotations to create priors for unlabeled pixels. We demonstrate the effectiveness of our approach with an extensive evaluation on the Pascal-VOC, ADE20K, and Cityscapes datasets, significantly outperforming state-of-the-art methods.

View on arXiv PDF

Similar