ASCLJul 13, 2019

Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition

arXiv:1907.06017v139 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of improving speech recognition accuracy for Chinese datasets by efficiently transferring knowledge from language models, though it is incremental as it builds on existing knowledge distillation and fusion techniques.

The paper tackles the problem of integrating external language models into sequence-to-sequence speech recognition without adding components during testing by proposing a knowledge distillation approach, achieving a character error rate of 9.3% with an 18.42% relative reduction compared to the baseline.

Integrating an external language model into a sequence-to-sequence speech recognition system is non-trivial. Previous works utilize linear interpolation or a fusion network to integrate external language models. However, these approaches introduce external components, and increase decoding computation. In this paper, we instead propose a knowledge distillation based training approach to integrating external language models into a sequence-to-sequence model. A recurrent neural network language model, which is trained on large scale external text, generates soft labels to guide the sequence-to-sequence model training. Thus, the language model plays the role of the teacher. This approach does not add any external component to the sequence-to-sequence model during testing. And this approach is flexible to be combined with shallow fusion technique together for decoding. The experiments are conducted on public Chinese datasets AISHELL-1 and CLMAD. Our approach achieves a character error rate of 9.3%, which is relatively reduced by 18.42% compared with the vanilla sequence-to-sequence model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes