Towards measuring fairness in speech recognition: Fair-Speech dataset
This addresses the problem of measuring fairness across demographic groups in ASR for researchers, though it is incremental as it focuses on dataset creation rather than new methods.
The paper tackles the lack of datasets for evaluating fairness in speech recognition by introducing Fair-Speech, a publicly released corpus with approximately 26.5K utterances from 593 diverse U.S. participants, and provides ASR baselines for comparison.
The current public datasets for speech recognition (ASR) tend not to focus specifically on the fairness aspect, such as performance across different demographic groups. This paper introduces a novel dataset, Fair-Speech, a publicly released corpus to help researchers evaluate their ASR models for accuracy across a diverse set of self-reported demographic information, such as age, gender, ethnicity, geographic variation and whether the participants consider themselves native English speakers. Our dataset includes approximately 26.5K utterances in recorded speech by 593 people in the United States, who were paid to record and submit audios of themselves saying voice commands. We also provide ASR baselines, including on models trained on transcribed and untranscribed social media videos and open source models.