MLCLLGNov 15, 2017

Lattice Rescoring Strategies for Long Short Term Memory Language Models in Speech Recognition

arXiv:1711.05448v141 citations
Originality Synthesis-oriented
AI Analysis

This work addresses efficiency challenges for speech recognition practitioners, but it is incremental as it builds on existing lattice-rescoring methods.

The paper tackles the problem of integrating computationally expensive LSTM language models into speech recognition systems by evaluating lattice-rescoring algorithms, resulting in an 8% relative reduction in word error rate on a YouTube task compared to N-gram models.

Recurrent neural network (RNN) language models (LMs) and Long Short Term Memory (LSTM) LMs, a variant of RNN LMs, have been shown to outperform traditional N-gram LMs on speech recognition tasks. However, these models are computationally more expensive than N-gram LMs for decoding, and thus, challenging to integrate into speech recognizers. Recent research has proposed the use of lattice-rescoring algorithms using RNNLMs and LSTMLMs as an efficient strategy to integrate these models into a speech recognition system. In this paper, we evaluate existing lattice rescoring algorithms along with new variants on a YouTube speech recognition task. Lattice rescoring using LSTMLMs reduces the word error rate (WER) for this task by 8\% relative to the WER obtained using an N-gram LM.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes