FaceGuard: A Self-Supervised Defense Against Adversarial Face Images
This work is significant for improving the robustness of face recognition systems against adversarial attacks, which is a critical security concern for users and organizations relying on such systems.
This paper introduces FaceGuard, a self-supervised defense framework that tackles the problem of adversarial face images without relying on pre-computed adversarial training samples. FaceGuard achieves 99.81% detection accuracy on six unseen adversarial attack types on the LFW dataset and improves the face recognition performance of ArcFace from 34.27% TAR @ 0.1% FAR to 77.46% TAR @ 0.1% FAR.
Prevailing defense mechanisms against adversarial face images tend to overfit to the adversarial perturbations in the training set and fail to generalize to unseen adversarial attacks. We propose a new self-supervised adversarial defense framework, namely FaceGuard, that can automatically detect, localize, and purify a wide variety of adversarial faces without utilizing pre-computed adversarial training samples. During training, FaceGuard automatically synthesizes challenging and diverse adversarial attacks, enabling a classifier to learn to distinguish them from real faces and a purifier attempts to remove the adversarial perturbations in the image space. Experimental results on LFW dataset show that FaceGuard can achieve 99.81% detection accuracy on six unseen adversarial attack types. In addition, the proposed method can enhance the face recognition performance of ArcFace from 34.27% TAR @ 0.1% FAR under no defense to 77.46% TAR @ 0.1% FAR.