CVJan 23, 2025

LVFace: Progressive Cluster Optimization for Large Vision Models in Face Recognition

arXiv:2501.13420v37 citationsh-index: 4Has Code
Originality Highly original
AI Analysis

This addresses the problem of improving face recognition accuracy for applications like security and identification, though it appears incremental as it builds on existing ViT methods.

The paper tackles the bottleneck of Vision Transformers (ViTs) underperforming in face recognition due to CNN-inspired training, proposing LVFace with Progressive Cluster Optimization to achieve state-of-the-art results, including winning the ICCV 2021 Masked Face Recognition Challenge.

Vision Transformers (ViTs) have revolutionized large-scale visual modeling, yet remain underexplored in face recognition (FR) where CNNs still dominate. We identify a critical bottleneck: CNN-inspired training paradigms fail to unlock ViT's potential, leading to suboptimal performance and convergence instability.To address this challenge, we propose LVFace, a ViT-based FR model that integrates Progressive Cluster Optimization (PCO) to achieve superior results. Specifically, PCO sequentially applies negative class sub-sampling (NCS) for robust and fast feature alignment from random initialization, feature expectation penalties for centroid stabilization, performing cluster boundary refinement through full-batch training without NCS constraints. LVFace establishes a new state-of-the-art face recognition baseline, surpassing leading approaches such as UniFace and TopoFR across multiple benchmarks. Extensive experiments demonstrate that LVFace delivers consistent performance gains, while exhibiting scalability to large-scale datasets and compatibility with mainstream VLMs and LLMs. Notably, LVFace secured 1st place in the ICCV 2021 Masked Face Recognition (MFR)-Ongoing Challenge (March 2025), proving its efficacy in real-world scenarios. Project is available at https://github.com/bytedance/LVFace.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes