CVMay 8, 2024

Weakly-supervised Semantic Segmentation via Dual-stream Contrastive Learning of Cross-image Contextual Information

arXiv:2405.04913v13.74 citationsh-index: 40IEEE Transactions on Industrial Informatics

Originality Incremental advance

AI Analysis

This work addresses the problem of improving segmentation accuracy with limited supervision for computer vision applications, representing an incremental advance by integrating cross-image information into existing WSSS methods.

The paper tackles the performance gap in weakly supervised semantic segmentation (WSSS) by proposing DSCNet, a framework that leverages dual-stream contrastive learning to incorporate both pixel-wise and semantic-wise contextual information, achieving state-of-the-art results on PASCAL VOC and MS COCO benchmarks.

Weakly supervised semantic segmentation (WSSS) aims at learning a semantic segmentation model with only image-level tags. Despite intensive research on deep learning approaches over a decade, there is still a significant performance gap between WSSS and full semantic segmentation. Most current WSSS methods always focus on a limited single image (pixel-wise) information while ignoring the valuable inter-image (semantic-wise) information. From this perspective, a novel end-to-end WSSS framework called DSCNet is developed along with two innovations: i) pixel-wise group contrast and semantic-wise graph contrast are proposed and introduced into the WSSS framework; ii) a novel dual-stream contrastive learning (DSCL) mechanism is designed to jointly handle pixel-wise and semantic-wise context information for better WSSS performance. Specifically, the pixel-wise group contrast learning (PGCL) and semantic-wise graph contrast learning (SGCL) tasks form a more comprehensive solution. Extensive experiments on PASCAL VOC and MS COCO benchmarks verify the superiority of DSCNet over SOTA approaches and baseline models.

View on arXiv PDF

Similar