CVOct 21, 2022

Face Pyramid Vision Transformer

arXiv:2210.11974v24 citationsh-index: 18
Originality Incremental advance
AI Analysis

This work addresses face recognition and verification, offering a novel architecture that improves performance with reduced computational costs, though it appears incremental as it builds upon existing Vision Transformer and CNN concepts.

The authors tackled the problem of learning discriminative multi-scale facial representations for face recognition and verification by proposing the Face Pyramid Vision Transformer (FPVT), which demonstrated excellent performance over ten state-of-the-art methods on seven benchmark datasets despite having fewer parameters.

A novel Face Pyramid Vision Transformer (FPVT) is proposed to learn a discriminative multi-scale facial representations for face recognition and verification. In FPVT, Face Spatial Reduction Attention (FSRA) and Dimensionality Reduction (FDR) layers are employed to make the feature maps compact, thus reducing the computations. An Improved Patch Embedding (IPE) algorithm is proposed to exploit the benefits of CNNs in ViTs (e.g., shared weights, local context, and receptive fields) to model lower-level edges to higher-level semantic primitives. Within FPVT framework, a Convolutional Feed-Forward Network (CFFN) is proposed that extracts locality information to learn low level facial information. The proposed FPVT is evaluated on seven benchmark datasets and compared with ten existing state-of-the-art methods, including CNNs, pure ViTs, and Convolutional ViTs. Despite fewer parameters, FPVT has demonstrated excellent performance over the compared methods. Project page is available at https://khawar-islam.github.io/fpvt/

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes