ASAICLLGSDMay 30, 2025

Pushing the Limits of Beam Search Decoding for Transducer-based ASR models

NVIDIA
arXiv:2506.00185v12 citationsh-index: 17Has CodeINTERSPEECH
Originality Incremental advance
AI Analysis

This work addresses a bottleneck in practical ASR applications by making beam search more efficient for Transducers, with incremental improvements in speed and accuracy.

The paper tackles the slow beam search decoding in Transducer-based ASR models by introducing a universal acceleration method, resulting in a 10-20% speed gap between beam and greedy modes and 14-30% relative WER improvement compared to greedy decoding.

Transducer models have emerged as a promising choice for end-to-end ASR systems, offering a balanced trade-off between recognition accuracy, streaming capabilities, and inference speed in greedy decoding. However, beam search significantly slows down Transducers due to repeated evaluations of key network components, limiting practical applications. This paper introduces a universal method to accelerate beam search for Transducers, enabling the implementation of two optimized algorithms: ALSD++ and AES++. The proposed method utilizes batch operations, a tree-based hypothesis structure, novel blank scoring for enhanced shallow fusion, and CUDA graph execution for efficient GPU inference. This narrows the speed gap between beam and greedy modes to only 10-20% for the whole system, achieves 14-30% relative improvement in WER compared to greedy decoding, and improves shallow fusion for low-resource up to 11% compared to existing implementations. All the algorithms are open sourced.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes