CVJun 12, 2025

FaceLiVT: Face Recognition using Linear Vision Transformer with Structural Reparameterization For Mobile Device

Novendra Setyawan, Chi-Chia Sun, Mao-Hsiu Hsu, Wen-Kai Kuo, Jun-Wei Hsieh

arXiv:2506.10361v17 citationsh-index: 5ICIP

Originality Incremental advance

AI Analysis

This provides an incremental improvement for real-time face recognition on resource-constrained platforms like mobile devices.

The paper tackled the problem of efficient face recognition on mobile devices by introducing FaceLiVT, a lightweight hybrid CNN-Transformer model with a Multi-Head Linear Attention mechanism, achieving 8.6x faster inference than EdgeFace and 21.2x faster than a pure ViT-based model while maintaining competitive accuracy on benchmarks like LFW and IJB-C.

This paper introduces FaceLiVT, a lightweight yet powerful face recognition model that integrates a hybrid Convolution Neural Network (CNN)-Transformer architecture with an innovative and lightweight Multi-Head Linear Attention (MHLA) mechanism. By combining MHLA alongside a reparameterized token mixer, FaceLiVT effectively reduces computational complexity while preserving competitive accuracy. Extensive evaluations on challenging benchmarks; including LFW, CFP-FP, AgeDB-30, IJB-B, and IJB-C; highlight its superior performance compared to state-of-the-art lightweight models. MHLA notably improves inference speed, allowing FaceLiVT to deliver high accuracy with lower latency on mobile devices. Specifically, FaceLiVT is 8.6 faster than EdgeFace, a recent hybrid CNN-Transformer model optimized for edge devices, and 21.2 faster than a pure ViT-Based model. With its balanced design, FaceLiVT offers an efficient and practical solution for real-time face recognition on resource-constrained platforms.

View on arXiv PDF

Similar