This Face Does Not Exist ... But It Might Be Yours! Identity Leakage in Generative Models
This work raises privacy concerns for individuals whose images are used to train generative models, highlighting a potential vulnerability in the creation of synthetic media.
This paper investigates whether identity information from training data can leak into synthetic faces generated by GANs. Using StyleGAN2 and five different face matchers, the authors demonstrate that identity leakage does occur for some matching methods, suggesting that synthetic faces might inadvertently reveal real identities.
Generative adversarial networks (GANs) are able to generate high resolution photo-realistic images of objects that "do not exist." These synthetic images are rather difficult to detect as fake. However, the manner in which these generative models are trained hints at a potential for information leakage from the supplied training data, especially in the context of synthetic faces. This paper presents experiments suggesting that identity information in face images can flow from the training corpus into synthetic samples without any adversarial actions when building or using the existing model. This raises privacy-related questions, but also stimulates discussions of (a) the face manifold's characteristics in the feature space and (b) how to create generative models that do not inadvertently reveal identity information of real subjects whose images were used for training. We used five different face matchers (face_recognition, FaceNet, ArcFace, SphereFace and Neurotechnology MegaMatcher) and the StyleGAN2 synthesis model, and show that this identity leakage does exist for some, but not all methods. So, can we say that these synthetically generated faces truly do not exist? Databases of real and synthetically generated faces are made available with this paper to allow full replicability of the results discussed in this work.