Designing and Generating Diverse, Equitable Face Image Datasets for Face Verification Tasks
This work addresses fairness issues in face verification for applications like online banking, though it is incremental as it builds on existing generative methods to improve dataset diversity.
The paper tackled the problem of biases in face verification datasets by proposing a methodology to generate diverse synthetic face images, resulting in the creation of the DIF-V dataset with 27,780 images of 926 identities and revealing biases in existing models.
Face verification is a significant component of identity authentication in various applications including online banking and secure access to personal devices. The majority of the existing face image datasets often suffer from notable biases related to race, gender, and other demographic characteristics, limiting the effectiveness and fairness of face verification systems. In response to these challenges, we propose a comprehensive methodology that integrates advanced generative models to create varied and diverse high-quality synthetic face images. This methodology emphasizes the representation of a diverse range of facial traits, ensuring adherence to characteristics permissible in identity card photographs. Furthermore, we introduce the Diverse and Inclusive Faces for Verification (DIF-V) dataset, comprising 27,780 images of 926 unique identities, designed as a benchmark for future research in face verification. Our analysis reveals that existing verification models exhibit biases toward certain genders and races, and notably, applying identity style modifications negatively impacts model performance. By tackling the inherent inequities in existing datasets, this work not only enriches the discussion on diversity and ethics in artificial intelligence but also lays the foundation for developing more inclusive and reliable face verification technologies