CLSDASAug 23, 2019

Gender Representation in French Broadcast Corpora and Its Impact on ASR Performance

arXiv:1908.08717v165 citations
AI Analysis

This addresses gender bias in ASR systems for French broadcast data, highlighting an incremental analysis of existing corpora.

The paper analyzed gender representation in French broadcast corpora and found women are underrepresented in speakers and speech turns, especially among anchors, leading to decreased ASR performance for women, though sufficient data for media-experienced speakers can mitigate this.

This paper analyzes the gender representation in four major corpora of French broadcast. These corpora being widely used within the speech processing community, they are a primary material for training automatic speech recognition (ASR) systems. As gender bias has been highlighted in numerous natural language processing (NLP) applications, we study the impact of the gender imbalance in TV and radio broadcast on the performance of an ASR system. This analysis shows that women are under-represented in our data in terms of speakers and speech turns. We introduce the notion of speaker role to refine our analysis and find that women are even fewer within the Anchor category corresponding to prominent speakers. The disparity of available data for both gender causes performance to decrease on women. However this global trend can be counterbalanced for speaker who are used to speak in the media when sufficient amount of data is available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes