CVAILGSep 15, 2023

Toward responsible face datasets: modeling the distribution of a disentangled latent space for sampling face images from demographic groups

arXiv:2309.08442v17 citationsh-index: 6
Originality Incremental advance
AI Analysis

This addresses fairness issues in facial recognition for affected demographic groups, offering an incremental improvement over existing dataset collection methods.

The paper tackles the problem of biased facial recognition systems by proposing a method to generate balanced synthetic face datasets from demographic groups using a disentangled StyleGAN latent space, enabling effective synthesis of any demographic combination with identities distinct from the original training data.

Recently, it has been exposed that some modern facial recognition systems could discriminate specific demographic groups and may lead to unfair attention with respect to various facial attributes such as gender and origin. The main reason are the biases inside datasets, unbalanced demographics, used to train theses models. Unfortunately, collecting a large-scale balanced dataset with respect to various demographics is impracticable. In this paper, we investigate as an alternative the generation of a balanced and possibly bias-free synthetic dataset that could be used to train, to regularize or to evaluate deep learning-based facial recognition models. We propose to use a simple method for modeling and sampling a disentangled projection of a StyleGAN latent space to generate any combination of demographic groups (e.g. $hispanic-female$). Our experiments show that we can synthesis any combination of demographic groups effectively and the identities are different from the original training dataset. We also released the source code.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes