CLLGOct 23, 2019

Efficient Dynamic WFST Decoding for Personalized Language Models

arXiv:1910.10670v16 citations
Originality Incremental advance
AI Analysis

This work addresses efficiency bottlenecks in speech recognition for personalized applications, representing an incremental improvement.

The paper tackles the problem of slow dynamic WFST decoding with personalized language models by proposing a two-layer cache mechanism and pre-initialization methods, resulting in decoding time reductions by factors of three and five in experiments on a calling speech recognition task.

We propose a two-layer cache mechanism to speed up dynamic WFST decoding with personalized language models. The first layer is a public cache that stores most of the static part of the graph. This is shared globally among all users. A second layer is a private cache that caches the graph that represents the personalized language model, which is only shared by the utterances from a particular user. We also propose two simple yet effective pre-initialization methods, one based on breadth-first search, and another based on a data-driven exploration of decoder states using previous utterances. Experiments with a calling speech recognition task using a personalized contact list demonstrate that the proposed public cache reduces decoding time by factor of three compared to decoding without pre-initialization. Using the private cache provides additional efficiency gains, reducing the decoding time by a factor of five.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes