CV AIJul 28, 2025

Investigation of Accuracy and Bias in Face Recognition Trained with Synthetic Data

Pavel Korshunov, Ketan Kotwal, Christophe Ecabert, Vidit Vidit, Amir Mohammadi, Sebastien Marcel

arXiv:2507.20782v16.22 citationsh-index: 12

Originality Incremental advance

AI Analysis

This work addresses bias and performance issues in face recognition for applications requiring privacy and fairness, but it is incremental as it builds on existing synthetic data methods.

The study tackled the problem of achieving both high accuracy and fairness in face recognition systems trained with synthetic data, finding that demographically balanced synthetic datasets, particularly those generated with Stable Diffusion v3.5, show potential for bias mitigation while still lagging behind real datasets in generalization on challenging benchmarks.

Synthetic data has emerged as a promising alternative for training face recognition (FR) models, offering advantages in scalability, privacy compliance, and potential for bias mitigation. However, critical questions remain on whether both high accuracy and fairness can be achieved with synthetic data. In this work, we evaluate the impact of synthetic data on bias and performance of FR systems. We generate balanced face dataset, FairFaceGen, using two state of the art text-to-image generators, Flux.1-dev and Stable Diffusion v3.5 (SD35), and combine them with several identity augmentation methods, including Arc2Face and four IP-Adapters. By maintaining equal identity count across synthetic and real datasets, we ensure fair comparisons when evaluating FR performance on standard (LFW, AgeDB-30, etc.) and challenging IJB-B/C benchmarks and FR bias on Racial Faces in-the-Wild (RFW) dataset. Our results demonstrate that although synthetic data still lags behind the real datasets in the generalization on IJB-B/C, demographically balanced synthetic datasets, especially those generated with SD35, show potential for bias mitigation. We also observe that the number and quality of intra-class augmentations significantly affect FR accuracy and fairness. These findings provide practical guidelines for constructing fairer FR systems using synthetic data.

View on arXiv PDF

Similar