CL SD ASFeb 7, 2018

Joint Modeling of Accents and Acoustics for Multi-Accent Speech Recognition

Xuesong Yang, Kartik Audhkhasi, Andrew Rosenberg, Samuel Thomas, Bhuvana Ramabhadran, Mark Hasegawa-Johnson

arXiv:1802.02656v13.673 citations

Originality Incremental advance

AI Analysis

This work addresses accent mismatch in speech recognition, offering a novel method that improves performance for multi-accent scenarios, though it is incremental relative to existing multi-task approaches.

The paper tackled the problem of automatic speech recognition performance degradation due to accent mismatch by jointly learning an accent classifier and a multi-task acoustic model, resulting in relative improvements of 5.94% in word error rate on British English and 9.47% on American English compared to a baseline.

The performance of automatic speech recognition systems degrades with increasing mismatch between the training and testing scenarios. Differences in speaker accents are a significant source of such mismatch. The traditional approach to deal with multiple accents involves pooling data from several accents during training and building a single model in multi-task fashion, where tasks correspond to individual accents. In this paper, we explore an alternate model where we jointly learn an accent classifier and a multi-task acoustic model. Experiments on the American English Wall Street Journal and British English Cambridge corpora demonstrate that our joint model outperforms the strong multi-task acoustic model baseline. We obtain a 5.94% relative improvement in word error rate on British English, and 9.47% relative improvement on American English. This illustrates that jointly modeling with accent information improves acoustic model performance.

View on arXiv PDF

Similar