SACANet: scene-aware class attention network for semantic segmentation of remote sensing images
This work addresses the problem of accurate semantic segmentation in remote sensing images for applications like land cover mapping, though it appears incremental as it builds on existing attention mechanisms.
The paper tackled semantic segmentation of remote sensing images by proposing SACANet, which integrates scene-aware and class attention mechanisms to improve context modeling and reduce background noise, achieving state-of-the-art performance on three datasets.
Spatial attention mechanism has been widely used in semantic segmentation of remote sensing images given its capability to model long-range dependencies. Many methods adopting spatial attention mechanism aggregate contextual information using direct relationships between pixels within an image, while ignoring the scene awareness of pixels (i.e., being aware of the global context of the scene where the pixels are located and perceiving their relative positions). Given the observation that scene awareness benefits context modeling with spatial correlations of ground objects, we design a scene-aware attention module based on a refined spatial attention mechanism embedding scene awareness. Besides, we present a local-global class attention mechanism to address the problem that general attention mechanism introduces excessive background noises while hardly considering the large intra-class variance in remote sensing images. In this paper, we integrate both scene-aware and class attentions to propose a scene-aware class attention network (SACANet) for semantic segmentation of remote sensing images. Experimental results on three datasets show that SACANet outperforms other state-of-the-art methods and validate its effectiveness. Code is available at https://github.com/xwmaxwma/rssegmentation.