CLASJul 15, 2019

Investigation on N-gram Approximated RNNLMs for Recognition of Morphologically Rich Speech

arXiv:1907.06407v35 citations
Originality Incremental advance
AI Analysis

This work addresses efficiency issues in speech recognition for morphologically rich languages like Hungarian, but it is incremental as it builds on existing approximation techniques.

The paper tackled the challenge of reducing processing delay in Hungarian conversational telephone speech recognition by approximating RNNLMs with n-gram models, achieving a 40% recovery of RNNLM perplexity reduction and an 8% relative WER reduction while maintaining real-time operation.

Recognition of Hungarian conversational telephone speech is challenging due to the informal style and morphological richness of the language. Recurrent Neural Network Language Model (RNNLM) can provide remedy for the high perplexity of the task; however, two-pass decoding introduces a considerable processing delay. In order to eliminate this delay we investigate approaches aiming at the complexity reduction of RNNLM, while preserving its accuracy. We compare the performance of conventional back-off n-gram language models (BNLM), BNLM approximation of RNNLMs (RNN-BNLM) and RNN n-grams in terms of perplexity and word error rate (WER). Morphological richness is often addressed by using statistically derived subwords - morphs - in the language models, hence our investigations are extended to morph-based models, as well. We found that using RNN-BNLMs 40% of the RNNLM perplexity reduction can be recovered, which is roughly equal to the performance of a RNN 4-gram model. Combining morph-based modeling and approximation of RNNLM, we were able to achieve 8% relative WER reduction and preserve real-time operation of our conversational telephone speech recognition system.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes