ASLGSDMLJul 1, 2019

LSTM Language Models for LVCSR in First-Pass Decoding and Lattice-Rescoring

arXiv:1907.01030v134 citations
Originality Incremental advance
AI Analysis

This work addresses a known bottleneck in LVCSR systems for speech recognition applications, offering an incremental improvement in decoding efficiency.

The paper tackled the challenge of efficiently integrating LSTM language models into large vocabulary continuous speech recognition (LVCSR) systems by proposing a method combining first-pass decoding with lattice rescoring, achieving competitive results on Hub5'00 and Librispeech corpora with runtime better than real-time.

LSTM based language models are an important part of modern LVCSR systems as they significantly improve performance over traditional backoff language models. Incorporating them efficiently into decoding has been notoriously difficult. In this paper we present an approach based on a combination of one-pass decoding and lattice rescoring. We perform decoding with the LSTM-LM in the first pass but recombine hypothesis that share the last two words, afterwards we rescore the resulting lattice. We run our systems on GPGPU equipped machines and are able to produce competitive results on the Hub5'00 and Librispeech evaluation corpora with a runtime better than real-time. In addition we shortly investigate the possibility to carry out the full sum over all state-sequences belonging to a given word-hypothesis during decoding without recombination.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes