CVFeb 28, 2023

Foundation Model Drives Weakly Incremental Learning for Semantic Segmentation

arXiv:2302.14250v220 citationsh-index: 21
Originality Incremental advance
AI Analysis

This addresses the problem of reducing annotation costs for semantic segmentation in computer vision, though it is incremental as it builds on existing weakly supervised and incremental learning methods.

The paper tackles weakly incremental learning for semantic segmentation (WILSS), which learns new classes from cheap image-level labels instead of costly dense annotations, by proposing a framework called FMWISS that uses foundation models to generate pseudo labels and a teacher-student architecture to optimize them. The result shows superior performance, achieving 70.7% and 73.3% in the 15-5 VOC setting, outperforming the state-of-the-art by 3.4% and 6.1%.

Modern incremental learning for semantic segmentation methods usually learn new categories based on dense annotations. Although achieve promising results, pixel-by-pixel labeling is costly and time-consuming. Weakly incremental learning for semantic segmentation (WILSS) is a novel and attractive task, which aims at learning to segment new classes from cheap and widely available image-level labels. Despite the comparable results, the image-level labels can not provide details to locate each segment, which limits the performance of WILSS. This inspires us to think how to improve and effectively utilize the supervision of new classes given image-level labels while avoiding forgetting old ones. In this work, we propose a novel and data-efficient framework for WILSS, named FMWISS. Specifically, we propose pre-training based co-segmentation to distill the knowledge of complementary foundation models for generating dense pseudo labels. We further optimize the noisy pseudo masks with a teacher-student architecture, where a plug-in teacher is optimized with a proposed dense contrastive loss. Moreover, we introduce memory-based copy-paste augmentation to improve the catastrophic forgetting problem of old classes. Extensive experiments on Pascal VOC and COCO datasets demonstrate the superior performance of our framework, e.g., FMWISS achieves 70.7% and 73.3% in the 15-5 VOC setting, outperforming the state-of-the-art method by 3.4% and 6.1%, respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes