CVMar 31, 2023

Rethinking Local Perception in Lightweight Vision Transformer

arXiv:2303.17803v559 citationsh-index: 28Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of making vision transformers efficient for mobile applications, representing an incremental improvement in lightweight model design.

The paper tackles the performance degradation of Vision Transformers when scaled down for mobile use by introducing CloFormer, a lightweight model that uses AttnConv to capture high-frequency local information, achieving superior results in image classification, object detection, and semantic segmentation.

Vision Transformers (ViTs) have been shown to be effective in various vision tasks. However, resizing them to a mobile-friendly size leads to significant performance degradation. Therefore, developing lightweight vision transformers has become a crucial area of research. This paper introduces CloFormer, a lightweight vision transformer that leverages context-aware local enhancement. CloFormer explores the relationship between globally shared weights often used in vanilla convolutional operators and token-specific context-aware weights appearing in attention, then proposes an effective and straightforward module to capture high-frequency local information. In CloFormer, we introduce AttnConv, a convolution operator in attention's style. The proposed AttnConv uses shared weights to aggregate local information and deploys carefully designed context-aware weights to enhance local features. The combination of the AttnConv and vanilla attention which uses pooling to reduce FLOPs in CloFormer enables the model to perceive high-frequency and low-frequency information. Extensive experiments were conducted in image classification, object detection, and semantic segmentation, demonstrating the superiority of CloFormer. The code is available at \url{https://github.com/qhfan/CloFormer}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes