CVLGJan 9

ViTNT-FIQA: Training-Free Face Image Quality Assessment with Vision Transformers

arXiv:2601.05741v13 citationsh-index: 41
Originality Incremental advance
AI Analysis

This addresses the need for efficient and reliable quality assessment in face recognition systems, though it is incremental as it builds on existing Vision Transformer architectures.

The paper tackles the problem of face image quality assessment by proposing a training-free method that measures patch embedding stability across Vision Transformer blocks, achieving competitive performance on eight benchmarks while requiring only a single forward pass.

Face Image Quality Assessment (FIQA) is essential for reliable face recognition systems. Current approaches primarily exploit only final-layer representations, while training-free methods require multiple forward passes or backpropagation. We propose ViTNT-FIQA, a training-free approach that measures the stability of patch embedding evolution across intermediate Vision Transformer (ViT) blocks. We demonstrate that high-quality face images exhibit stable feature refinement trajectories across blocks, while degraded images show erratic transformations. Our method computes Euclidean distances between L2-normalized patch embeddings from consecutive transformer blocks and aggregates them into image-level quality scores. We empirically validate this correlation on a quality-labeled synthetic dataset with controlled degradation levels. Unlike existing training-free approaches, ViTNT-FIQA requires only a single forward pass without backpropagation or architectural modifications. Through extensive evaluation on eight benchmarks (LFW, AgeDB-30, CFP-FP, CALFW, Adience, CPLFW, XQLFW, IJB-C), we show that ViTNT-FIQA achieves competitive performance with state-of-the-art methods while maintaining computational efficiency and immediate applicability to any pre-trained ViT-based face recognition model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes