CVAIJul 28, 2025

Investigation of Accuracy and Bias in Face Recognition Trained with Synthetic Data

arXiv:2507.20782v12 citationsh-index: 12
Originality Incremental advance
AI Analysis

This work addresses bias and performance issues in face recognition for applications requiring privacy and fairness, but it is incremental as it builds on existing synthetic data methods.

The study tackled the problem of achieving both high accuracy and fairness in face recognition systems trained with synthetic data, finding that demographically balanced synthetic datasets, particularly those generated with Stable Diffusion v3.5, show potential for bias mitigation while still lagging behind real datasets in generalization on challenging benchmarks.

Synthetic data has emerged as a promising alternative for training face recognition (FR) models, offering advantages in scalability, privacy compliance, and potential for bias mitigation. However, critical questions remain on whether both high accuracy and fairness can be achieved with synthetic data. In this work, we evaluate the impact of synthetic data on bias and performance of FR systems. We generate balanced face dataset, FairFaceGen, using two state of the art text-to-image generators, Flux.1-dev and Stable Diffusion v3.5 (SD35), and combine them with several identity augmentation methods, including Arc2Face and four IP-Adapters. By maintaining equal identity count across synthetic and real datasets, we ensure fair comparisons when evaluating FR performance on standard (LFW, AgeDB-30, etc.) and challenging IJB-B/C benchmarks and FR bias on Racial Faces in-the-Wild (RFW) dataset. Our results demonstrate that although synthetic data still lags behind the real datasets in the generalization on IJB-B/C, demographically balanced synthetic datasets, especially those generated with SD35, show potential for bias mitigation. We also observe that the number and quality of intra-class augmentations significantly affect FR accuracy and fairness. These findings provide practical guidelines for constructing fairer FR systems using synthetic data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes