SSA: Semantic Structure Aware Inference for Weakly Pixel-Wise Dense Predictions without Cost
This addresses the computational overhead and complexity in training modules for weakly supervised dense predictions, offering a parameter-free and training-free solution that is broadly applicable.
The paper tackles the problem of generating high-quality Class Attention Maps (CAM) for weakly supervised pixel-wise dense predictions without additional training or parameters, achieving new state-of-the-art results on weakly-supervised object localization and semantic segmentation tasks.
The pixel-wise dense prediction tasks based on weakly supervisions currently use Class Attention Maps (CAM) to generate pseudo masks as ground-truth. However, the existing methods typically depend on the painstaking training modules, which may bring in grinding computational overhead and complex training procedures. In this work, the semantic structure aware inference (SSA) is proposed to explore the semantic structure information hidden in different stages of the CNN-based network to generate high-quality CAM in the model inference. Specifically, the semantic structure modeling module (SSM) is first proposed to generate the class-agnostic semantic correlation representation, where each item denotes the affinity degree between one category of objects and all the others. Then the structured feature representation is explored to polish an immature CAM via the dot product operation. Finally, the polished CAMs from different backbone stages are fused as the output. The proposed method has the advantage of no parameters and does not need to be trained. Therefore, it can be applied to a wide range of weakly-supervised pixel-wise dense prediction tasks. Experimental results on both weakly-supervised object localization and weakly-supervised semantic segmentation tasks demonstrate the effectiveness of the proposed method, which achieves the new state-of-the-art results on these two tasks.