CVAINov 7, 2022

MogaNet: Multi-order Gated Aggregation Network

arXiv:2211.03295v4156 citationsh-index: 57Has Code
Originality Incremental advance
AI Analysis

This work addresses a key limitation in ConvNets for computer vision tasks, offering improved efficiency and performance, though it is incremental as it builds on existing ConvNet architectures.

The paper tackles the representation bottleneck in modern ConvNets by proposing MogaNet, a new family of ConvNets that uses multi-order gated aggregation for efficient feature learning, achieving 80.0% and 87.8% accuracy on ImageNet-1K with 5.2M and 181M parameters while saving 59% FLOPs and 17M parameters compared to prior models.

By contextualizing the kernel as global as possible, Modern ConvNets have shown great potential in computer vision tasks. However, recent progress on multi-order game-theoretic interaction within deep neural networks (DNNs) reveals the representation bottleneck of modern ConvNets, where the expressive interactions have not been effectively encoded with the increased kernel size. To tackle this challenge, we propose a new family of modern ConvNets, dubbed MogaNet, for discriminative visual representation learning in pure ConvNet-based models with favorable complexity-performance trade-offs. MogaNet encapsulates conceptually simple yet effective convolutions and gated aggregation into a compact module, where discriminative features are efficiently gathered and contextualized adaptively. MogaNet exhibits great scalability, impressive efficiency of parameters, and competitive performance compared to state-of-the-art ViTs and ConvNets on ImageNet and various downstream vision benchmarks, including COCO object detection, ADE20K semantic segmentation, 2D&3D human pose estimation, and video prediction. Notably, MogaNet hits 80.0% and 87.8% accuracy with 5.2M and 181M parameters on ImageNet-1K, outperforming ParC-Net and ConvNeXt-L, while saving 59% FLOPs and 17M parameters, respectively. The source code is available at https://github.com/Westlake-AI/MogaNet.

Code Implementations7 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes