CVAug 30, 2021

Hire-MLP: Vision MLP via Hierarchical Rearrangement

arXiv:2108.13341v2120 citationsHas Code
AI Analysis

It addresses the problem of making MLPs a general backbone for computer vision, offering a competitive alternative to transformers with better accuracy-throughput trade-offs, though it is incremental in improving existing MLP architectures.

The paper tackles the inflexibility and limited performance of vision MLPs by proposing Hire-MLP, which uses hierarchical rearrangement to capture local and global spatial information, achieving competitive results such as 83.8% top-1 accuracy on ImageNet and 51.7% box AP on COCO.

Previous vision MLPs such as MLP-Mixer and ResMLP accept linearly flattened image patches as input, making them inflexible for different input sizes and hard to capture spatial information. Such approach withholds MLPs from getting comparable performance with their transformer-based counterparts and prevents them from becoming a general backbone for computer vision. This paper presents Hire-MLP, a simple yet competitive vision MLP architecture via \textbf{Hi}erarchical \textbf{re}arrangement, which contains two levels of rearrangements. Specifically, the inner-region rearrangement is proposed to capture local information inside a spatial region, and the cross-region rearrangement is proposed to enable information communication between different regions and capture global context by circularly shifting all tokens along spatial directions. Extensive experiments demonstrate the effectiveness of Hire-MLP as a versatile backbone for various vision tasks. In particular, Hire-MLP achieves competitive results on image classification, object detection and semantic segmentation tasks, e.g., 83.8% top-1 accuracy on ImageNet, 51.7% box AP and 44.8% mask AP on COCO val2017, and 49.9% mIoU on ADE20K, surpassing previous transformer-based and MLP-based models with better trade-off for accuracy and throughput. Code is available at https://github.com/ggjy/Hire-Wave-MLP.pytorch.

Code Implementations10 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes