Mixup-CAM: Weakly-supervised Semantic Segmentation via Uncertainty Regularization
This work addresses a key bottleneck in weakly-supervised semantic segmentation for computer vision applications, offering an incremental improvement over existing methods.
The paper tackles the problem of weakly-supervised semantic segmentation with image-level labels, where existing methods produce incomplete object response maps by focusing only on discriminative regions. The proposed Mixup-CAM framework uses mixup data augmentation and uncertainty regularization to generate more complete and uniform response maps, achieving favorable performance against state-of-the-art approaches.
Obtaining object response maps is one important step to achieve weakly-supervised semantic segmentation using image-level labels. However, existing methods rely on the classification task, which could result in a response map only attending on discriminative object regions as the network does not need to see the entire object for optimizing the classification loss. To tackle this issue, we propose a principled and end-to-end train-able framework to allow the network to pay attention to other parts of the object, while producing a more complete and uniform response map. Specifically, we introduce the mixup data augmentation scheme into the classification network and design two uncertainty regularization terms to better interact with the mixup strategy. In experiments, we conduct extensive analysis to demonstrate the proposed method and show favorable performance against state-of-the-art approaches.