CVApr 10

FaceLiVTv2: An Improved Hybrid Architecture for Efficient Mobile Face Recognition

Novendra Setyawan, Chi-Chia Sun, Mao-Hsiu Hsu, Wen-Kai Kuo, Jun-Wei Hsieh

arXiv:2604.0912754.51 citationsh-index: 5Has Code

Predicted impact top 64% in CV · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the problem of efficient real-time face recognition for deployment on edge and mobile devices, presenting an incremental improvement over previous hybrid architectures.

The paper tackles the challenge of balancing accuracy and computational efficiency in mobile face recognition by introducing FaceLiVTv2, an improved hybrid CNN-Transformer architecture. It reduces mobile inference latency by 22% compared to its predecessor and achieves speedups of up to 30.8% over existing methods while maintaining higher recognition accuracy.

Lightweight face recognition is increasingly important for deployment on edge and mobile devices, where strict constraints on latency, memory, and energy consumption must be met alongside reliable accuracy. Although recent hybrid CNN-Transformer architectures have advanced global context modeling, striking an effective balance between recognition performance and computational efficiency remains an open challenge. In this work, we present FaceLiVTv2, an improved version of our FaceLiVT hybrid architecture designed for efficient global--local feature interaction in mobile face recognition. At its core is Lite MHLA, a lightweight global token interaction module that replaces the original multi-layer attention design with multi-head linear token projections and affine rescale transformations, reducing redundancy while preserving representational diversity across heads. We further integrate Lite MHLA into a unified RepMix block that coordinates local and global feature interactions and adopts global depthwise convolution for adaptive spatial aggregation in the embedding stage. Under our experimental setup, results on LFW, CA-LFW, CP-LFW, CFP-FP, AgeDB-30, and IJB show that FaceLiVTv2 consistently improves the accuracy-efficiency trade-off over existing lightweight methods. Notably, FaceLiVTv2 reduces mobile inference latency by 22% relative to FaceLiVTv1, achieves speedups of up to 30.8% over GhostFaceNets on mobile devices, and delivers 20-41% latency improvements over EdgeFace and KANFace across platforms while maintaining higher recognition accuracy. These results demonstrate that FaceLiVTv2 offers a practical and deployable solution for real-time face recognition. Code is available at https://github.com/novendrastywn/FaceLiVT.

View on arXiv PDF Code

Similar