Contrastive Context-Aware Learning for 3D High-Fidelity Mask Face Presentation Attack Detection
This addresses security vulnerabilities in face recognition systems from realistic mask attacks, but is incremental as it builds on existing PAD methods with a new dataset and training approach.
The paper tackles the problem of detecting high-fidelity mask attacks in face recognition systems by introducing a large-scale dataset (CASIA-SURF HiFiMask) with 54,600 videos from 75 subjects and 225 masks, and proposing a Contrastive Context-aware Learning (CCL) framework that improves performance on this and other 3D mask datasets.
Face presentation attack detection (PAD) is essential to secure face recognition systems primarily from high-fidelity mask attacks. Most existing 3D mask PAD benchmarks suffer from several drawbacks: 1) a limited number of mask identities, types of sensors, and a total number of videos; 2) low-fidelity quality of facial masks. Basic deep models and remote photoplethysmography (rPPG) methods achieved acceptable performance on these benchmarks but still far from the needs of practical scenarios. To bridge the gap to real-world applications, we introduce a largescale High-Fidelity Mask dataset, namely CASIA-SURF HiFiMask (briefly HiFiMask). Specifically, a total amount of 54,600 videos are recorded from 75 subjects with 225 realistic masks by 7 new kinds of sensors. Together with the dataset, we propose a novel Contrastive Context-aware Learning framework, namely CCL. CCL is a new training methodology for supervised PAD tasks, which is able to learn by leveraging rich contexts accurately (e.g., subjects, mask material and lighting) among pairs of live faces and high-fidelity mask attacks. Extensive experimental evaluations on HiFiMask and three additional 3D mask datasets demonstrate the effectiveness of our method.