ASAICLLGSDMay 28, 2025

NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding

NVIDIA
arXiv:2505.22857v16 citationsh-index: 17Has CodeINTERSPEECH
Originality Incremental advance
AI Analysis

This work addresses the problem of slow context-biasing for industrial ASR applications by providing a more efficient solution.

The paper tackled the computational inefficiency of n-gram language models for context-biasing in ASR by developing NGPU-LM, a GPU-accelerated approach that reduces the accuracy gap between greedy and beam search by over 50% in out-of-domain scenarios with less than 7% computational overhead.

Statistical n-gram language models are widely used for context-biasing tasks in Automatic Speech Recognition (ASR). However, existing implementations lack computational efficiency due to poor parallelization, making context-biasing less appealing for industrial use. This work rethinks data structures for statistical n-gram language models to enable fast and parallel operations for GPU-optimized inference. Our approach, named NGPU-LM, introduces customizable greedy decoding for all major ASR model types - including transducers, attention encoder-decoder models, and CTC - with less than 7% computational overhead. The proposed approach can eliminate more than 50% of the accuracy gap between greedy and beam search for out-of-domain scenarios while avoiding significant slowdown caused by beam search. The implementation of the proposed NGPU-LM is open-sourced.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes