CLSDASMar 1, 2023

Synthetic Cross-accent Data Augmentation for Automatic Speech Recognition

arXiv:2303.00802v16 citationsh-index: 14
Originality Incremental advance
AI Analysis

This work addresses the issue of accent bias in ASR for non-native English speakers, but it is incremental as it builds on existing accent-conversion models and shows limited generalization to unseen accents.

The authors tackled the problem of biased automatic speech recognition (ASR) systems that perform worse for non-native speakers by improving an accent-conversion model with phonetic knowledge and learned accent representations, and found that synthetically accented data helped ASR systems better understand speech from seen accents but not unseen accents.

The awareness for biased ASR datasets or models has increased notably in recent years. Even for English, despite a vast amount of available training data, systems perform worse for non-native speakers. In this work, we improve an accent-conversion model (ACM) which transforms native US-English speech into accented pronunciation. We include phonetic knowledge in the ACM training to provide accurate feedback about how well certain pronunciation patterns were recovered in the synthesized waveform. Furthermore, we investigate the feasibility of learned accent representations instead of static embeddings. Generated data was then used to train two state-of-the-art ASR systems. We evaluated our approach on native and non-native English datasets and found that synthetically accented data helped the ASR to better understand speech from seen accents. This observation did not translate to unseen accents, and it was not observed for a model that had been pre-trained exclusively with native speech.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes