CVOct 21, 2022

Face Pyramid Vision Transformer

Khawar Islam, Muhammad Zaigham Zaheer, Arif Mahmood

arXiv:2210.11974v26.54 citationsh-index: 18Has Code

Originality Incremental advance

AI Analysis

This work addresses face recognition and verification, offering a novel architecture that improves performance with reduced computational costs, though it appears incremental as it builds upon existing Vision Transformer and CNN concepts.

The authors tackled the problem of learning discriminative multi-scale facial representations for face recognition and verification by proposing the Face Pyramid Vision Transformer (FPVT), which demonstrated excellent performance over ten state-of-the-art methods on seven benchmark datasets despite having fewer parameters.

A novel Face Pyramid Vision Transformer (FPVT) is proposed to learn a discriminative multi-scale facial representations for face recognition and verification. In FPVT, Face Spatial Reduction Attention (FSRA) and Dimensionality Reduction (FDR) layers are employed to make the feature maps compact, thus reducing the computations. An Improved Patch Embedding (IPE) algorithm is proposed to exploit the benefits of CNNs in ViTs (e.g., shared weights, local context, and receptive fields) to model lower-level edges to higher-level semantic primitives. Within FPVT framework, a Convolutional Feed-Forward Network (CFFN) is proposed that extracts locality information to learn low level facial information. The proposed FPVT is evaluated on seven benchmark datasets and compared with ten existing state-of-the-art methods, including CNNs, pure ViTs, and Convolutional ViTs. Despite fewer parameters, FPVT has demonstrated excellent performance over the compared methods. Project page is available at https://khawar-islam.github.io/fpvt/

View on arXiv PDF Code

Similar