CVDec 13, 2022

CAT: Learning to Collaborate Channel and Spatial Attention from Multi-Information Fusion

arXiv:2212.06335v17 citationsh-index: 45
Originality Incremental advance
AI Analysis

This work addresses the need for better attention mechanisms in computer vision models, offering incremental improvements by integrating existing attention types more effectively.

The paper tackles the problem of improving deep CNNs by proposing a plug-and-play attention module called CAT, which enhances collaboration between channel and spatial attention mechanisms using learned traits and a novel three-way pooling operation, resulting in state-of-the-art performance on datasets like MS COCO, Pascal-VOC, Cifar-100, and ImageNet for tasks such as object detection, instance segmentation, and image classification.

Channel and spatial attention mechanism has proven to provide an evident performance boost of deep convolution neural networks (CNNs). Most existing methods focus on one or run them parallel (series), neglecting the collaboration between the two attentions. In order to better establish the feature interaction between the two types of attention, we propose a plug-and-play attention module, which we term "CAT"-activating the Collaboration between spatial and channel Attentions based on learned Traits. Specifically, we represent traits as trainable coefficients (i.e., colla-factors) to adaptively combine contributions of different attention modules to fit different image hierarchies and tasks better. Moreover, we propose the global entropy pooling (GEP) apart from global average pooling (GAP) and global maximum pooling (GMP) operators, an effective component in suppressing noise signals by measuring the information disorder of feature maps. We introduce a three-way pooling operation into attention modules and apply the adaptive mechanism to fuse their outcomes. Extensive experiments on MS COCO, Pascal-VOC, Cifar-100, and ImageNet show that our CAT outperforms existing state-of-the-art attention mechanisms in object detection, instance segmentation, and image classification. The model and code will be released soon.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes