CVMay 12, 2023

Zero-shot racially balanced dataset generation using an existing biased StyleGAN2

Anubhav Jain, Nasir Memon, Julian Togelius

arXiv:2305.07710v26.810 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses bias in facial recognition systems for societal and security applications, but it is incremental as it builds on existing generative models.

The paper tackled the problem of biased facial recognition models due to lack of diversity in datasets by generating a racially balanced synthetic dataset using a biased StyleGAN2, resulting in improved model performance with 50,000 identities per race and 13.5 million total images.

Facial recognition systems have made significant strides thanks to data-heavy deep learning models, but these models rely on large privacy-sensitive datasets. Further, many of these datasets lack diversity in terms of ethnicity and demographics, which can lead to biased models that can have serious societal and security implications. To address these issues, we propose a methodology that leverages the biased generative model StyleGAN2 to create demographically diverse images of synthetic individuals. The synthetic dataset is created using a novel evolutionary search algorithm that targets specific demographic groups. By training face recognition models with the resulting balanced dataset containing 50,000 identities per race (13.5 million images in total), we can improve their performance and minimize biases that might have been present in a model trained on a real dataset.

View on arXiv PDF Code

Similar