CVAICLOct 16, 2025

Benchmarking Multimodal Large Language Models for Face Recognition

arXiv:2510.14866v12 citationsh-index: 12Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the underexplored potential of MLLMs for face recognition, providing a benchmark for researchers to improve accuracy and generalization, but it is incremental as it focuses on evaluation rather than novel method development.

The paper benchmarks open-source multimodal large language models (MLLMs) for face recognition on standard datasets like LFW and CFP, finding that while MLLMs capture useful semantic cues, they underperform specialized models in high-precision zero-shot scenarios.

Multimodal large language models (MLLMs) have achieved remarkable performance across diverse vision-and-language tasks. However, their potential in face recognition remains underexplored. In particular, the performance of open-source MLLMs needs to be evaluated and compared with existing face recognition models on standard benchmarks with similar protocol. In this work, we present a systematic benchmark of state-of-the-art MLLMs for face recognition on several face recognition datasets, including LFW, CALFW, CPLFW, CFP, AgeDB and RFW. Experimental results reveal that while MLLMs capture rich semantic cues useful for face-related tasks, they lag behind specialized models in high-precision recognition scenarios in zero-shot applications. This benchmark provides a foundation for advancing MLLM-based face recognition, offering insights for the design of next-generation models with higher accuracy and generalization. The source code of our benchmark is publicly available in the project page.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes