Joint Learning of Saliency Detection and Weakly Supervised Semantic Segmentation
This work addresses the problem of improving weakly supervised semantic segmentation for computer vision researchers by integrating saliency detection, though it is incremental as it builds on existing multi-task learning approaches.
The paper tackles the inefficiency of using pre-trained saliency detection models for weakly supervised semantic segmentation by proposing a joint learning framework, SSNet, which achieves competitive performance against state-of-the-art methods on PASCAL VOC 2012 and saliency benchmarks.
Existing weakly supervised semantic segmentation (WSSS) methods usually utilize the results of pre-trained saliency detection (SD) models without explicitly modeling the connections between the two tasks, which is not the most efficient configuration. Here we propose a unified multi-task learning framework to jointly solve WSSS and SD using a single network, \ie saliency, and segmentation network (SSNet). SSNet consists of a segmentation network (SN) and a saliency aggregation module (SAM). For an input image, SN generates the segmentation result and, SAM predicts the saliency of each category and aggregating the segmentation masks of all categories into a saliency map. The proposed network is trained end-to-end with image-level category labels and class-agnostic pixel-level saliency labels. Experiments on PASCAL VOC 2012 segmentation dataset and four saliency benchmark datasets show the performance of our method compares favorably against state-of-the-art weakly supervised segmentation methods and fully supervised saliency detection methods.