CVAIMay 23, 2023

Efficient Multi-Scale Attention Module with Cross-Spatial Learning

arXiv:2305.13563v21374 citations
Originality Incremental advance
AI Analysis

This addresses a bottleneck in attention mechanisms for computer vision researchers and practitioners, offering an incremental improvement over existing methods.

The paper tackles the problem of channel attention mechanisms causing side effects in deep visual representation extraction by proposing an efficient multi-scale attention (EMA) module that retains per-channel information while reducing computational overhead. The result is improved performance on image classification and object detection tasks across benchmarks like CIFAR-100, ImageNet-1k, MS COCO, and VisDrone2019, though specific numbers are not provided in the abstract.

Remarkable effectiveness of the channel or spatial attention mechanisms for producing more discernible feature representation are illustrated in various computer vision tasks. However, modeling the cross-channel relationships with channel dimensionality reduction may bring side effect in extracting deep visual representations. In this paper, a novel efficient multi-scale attention (EMA) module is proposed. Focusing on retaining the information on per channel and decreasing the computational overhead, we reshape the partly channels into the batch dimensions and group the channel dimensions into multiple sub-features which make the spatial semantic features well-distributed inside each feature group. Specifically, apart from encoding the global information to re-calibrate the channel-wise weight in each parallel branch, the output features of the two parallel branches are further aggregated by a cross-dimension interaction for capturing pixel-level pairwise relationship. We conduct extensive ablation studies and experiments on image classification and object detection tasks with popular benchmarks (e.g., CIFAR-100, ImageNet-1k, MS COCO and VisDrone2019) for evaluating its performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes