SDASFeb 19, 2021

AISPEECH-SJTU accent identification system for the Accented English Speech Recognition Challenge

arXiv:2102.09828v129 citations
Originality Incremental advance
AI Analysis

This work addresses accent identification for speech recognition systems, representing an incremental improvement with specific gains in a domain-specific challenge.

The paper tackled the problem of accent identification in English speech by developing a system that achieved 83.63% average accuracy, ranking first in a challenge and outperforming others by over 10%.

This paper describes the AISpeech-SJTU system for the accent identification track of the Interspeech-2020 Accented English Speech Recognition Challenge. In this challenge track, only 160-hour accented English data collected from 8 countries and the auxiliary Librispeech dataset are provided for training. To build an accurate and robust accent identification system, we explore the whole system pipeline in detail. First, we introduce the ASR based phone posteriorgram (PPG) feature to accent identification and verify its efficacy. Then, a novel TTS based approach is carefully designed to augment the very limited accent training data for the first time. Finally, we propose the test time augmentation and embedding fusion schemes to further improve the system performance. Our final system is ranked first in the challenge and outperforms all the other participants by a large margin. The submitted system achieves 83.63\% average accuracy on the challenge evaluation data, ahead of the others by more than 10\% in absolute terms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes