CVJul 18, 2023

RepViT: Revisiting Mobile CNN From ViT Perspective

arXiv:2307.09283v8585 citationsh-index: 59Has Code
Originality Incremental advance
AI Analysis

This work addresses the need for high-performance, low-latency vision models on resource-constrained mobile devices, representing an incremental improvement over existing lightweight CNNs and ViTs.

The paper tackles the problem of designing efficient lightweight models for mobile devices by enhancing MobileNetV3 with architectural designs from lightweight Vision Transformers, resulting in RepViT, which achieves over 80% top-1 accuracy on ImageNet with 1.0 ms latency on an iPhone 12 and up to 10x faster inference in segmentation tasks.

Recently, lightweight Vision Transformers (ViTs) demonstrate superior performance and lower latency, compared with lightweight Convolutional Neural Networks (CNNs), on resource-constrained mobile devices. Researchers have discovered many structural connections between lightweight ViTs and lightweight CNNs. However, the notable architectural disparities in the block structure, macro, and micro designs between them have not been adequately examined. In this study, we revisit the efficient design of lightweight CNNs from ViT perspective and emphasize their promising prospect for mobile devices. Specifically, we incrementally enhance the mobile-friendliness of a standard lightweight CNN, \ie, MobileNetV3, by integrating the efficient architectural designs of lightweight ViTs. This ends up with a new family of pure lightweight CNNs, namely RepViT. Extensive experiments show that RepViT outperforms existing state-of-the-art lightweight ViTs and exhibits favorable latency in various vision tasks. Notably, on ImageNet, RepViT achieves over 80\% top-1 accuracy with 1.0 ms latency on an iPhone 12, which is the first time for a lightweight model, to the best of our knowledge. Besides, when RepViT meets SAM, our RepViT-SAM can achieve nearly 10$\times$ faster inference than the advanced MobileSAM. Codes and models are available at \url{https://github.com/THU-MIG/RepViT}.

Code Implementations8 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes