CVSep 30, 2023

Dual-Augmented Transformer Network for Weakly Supervised Semantic Segmentation

arXiv:2310.00307v11.5h-index: 3

Originality Incremental advance

AI Analysis

This addresses the problem of incomplete object segmentation for computer vision researchers using only class-level labels, representing an incremental improvement.

The paper tackles incomplete segmentation in weakly supervised semantic segmentation by proposing a dual-augmented transformer network with self-regularization constraints, achieving state-of-the-art performance on the PASCAL VOC 2012 benchmark.

Weakly supervised semantic segmentation (WSSS), a fundamental computer vision task, which aims to segment out the object within only class-level labels. The traditional methods adopt the CNN-based network and utilize the class activation map (CAM) strategy to discover the object regions. However, such methods only focus on the most discriminative region of the object, resulting in incomplete segmentation. An alternative is to explore vision transformers (ViT) to encode the image to acquire the global semantic information. Yet, the lack of transductive bias to objects is a flaw of ViT. In this paper, we explore the dual-augmented transformer network with self-regularization constraints for WSSS. Specifically, we propose a dual network with both CNN-based and transformer networks for mutually complementary learning, where both networks augment the final output for enhancement. Massive systemic evaluations on the challenging PASCAL VOC 2012 benchmark demonstrate the effectiveness of our method, outperforming previous state-of-the-art methods.

View on arXiv PDF

Similar