CL LG SD ASOct 31, 2018

Attention-based sequence-to-sequence model for speech recognition: development of state-of-the-art system on LibriSpeech and its application to non-native English

Yan Yin, Ramon Prieto, Bin Wang, Jianwei Zhou, Yiwei Gu, Yang Liu, Hui Lin

arXiv:1810.13088v20.22 citations

Originality Incremental advance

AI Analysis

This work improves speech recognition accuracy for English and non-native speakers, but is incremental as it builds on existing attention-based models.

The paper tackled speech recognition by developing an attention-based sequence-to-sequence model, achieving a state-of-the-art word error rate of 3.43% on LibriSpeech test clean and competitive results on non-native English speech.

Recent research has shown that attention-based sequence-to-sequence models such as Listen, Attend, and Spell (LAS) yield comparable results to state-of-the-art ASR systems on various tasks. In this paper, we describe the development of such a system and demonstrate its performance on two tasks: first we achieve a new state-of-the-art word error rate of 3.43% on the test clean subset of LibriSpeech English data; second on non-native English speech, including both read speech and spontaneous speech, we obtain very competitive results compared to a conventional system built with the most updated Kaldi recipe.

View on arXiv PDF

Similar