Sparse Spatial Attention Network for Semantic Segmentation
This work addresses computational bottlenecks in semantic segmentation models for computer vision applications, representing an incremental improvement over existing attention methods.
The paper tackled the inefficiency of spatial attention mechanisms in semantic segmentation by proposing a sparse spatial attention network (SSANet) that uses a sparse non-local block to sample key and value elements adaptively, achieving state-of-the-art performance on Cityscapes, PASCAL Context, and ADE20K datasets.
The spatial attention mechanism captures long-range dependencies by aggregating global contextual information to each query location, which is beneficial for semantic segmentation. In this paper, we present a sparse spatial attention network (SSANet) to improve the efficiency of the spatial attention mechanism without sacrificing the performance. Specifically, a sparse non-local (SNL) block is proposed to sample a subset of key and value elements for each query element to capture long-range relations adaptively and generate a sparse affinity matrix to aggregate contextual information efficiently. Experimental results show that the proposed approach outperforms other context aggregation methods and achieves state-of-the-art performance on the Cityscapes, PASCAL Context and ADE20K datasets.