CSA-Net: Channel-wise Spatially Autocorrelated Attention Networks
This work addresses the limitation of existing attention mechanisms in CNNs for researchers and practitioners in computer vision, though it appears incremental as it builds on prior channel-wise attention methods.
The authors tackled the problem of channel-wise attention in CNNs by proposing a novel channel-wise spatially autocorrelated (CSA) attention mechanism, which achieved competitive performance and superior generalization on ImageNet and MS COCO benchmarks for tasks like image classification and object detection.
In recent years, convolutional neural networks (CNNs) with channel-wise feature refining mechanisms have brought noticeable benefits to modelling channel dependencies. However, current attention paradigms fail to infer an optimal channel descriptor capable of simultaneously exploiting statistical and spatial relationships among feature maps. In this paper, to overcome this shortcoming, we present a novel channel-wise spatially autocorrelated (CSA) attention mechanism. Inspired by geographical analysis, the proposed CSA exploits the spatial relationships between channels of feature maps to produce an effective channel descriptor. To the best of our knowledge, this is the f irst time that the concept of geographical spatial analysis is utilized in deep CNNs. The proposed CSA imposes negligible learning parameters and light computational overhead to the deep model, making it a powerful yet efficient attention module of choice. We validate the effectiveness of the proposed CSA networks (CSA-Nets) through extensive experiments and analysis on ImageNet, and MS COCO benchmark datasets for image classification, object detection, and instance segmentation. The experimental results demonstrate that CSA-Nets are able to consistently achieve competitive performance and superior generalization than several state-of-the-art attention-based CNNs over different benchmark tasks and datasets.