ASLGNov 8, 2023

GPU-Accelerated WFST Beam Search Decoder for CTC-based Speech Recognition

arXiv:2311.04996v15 citationsh-index: 3Has Code
Originality Incremental advance
AI Analysis

This work addresses the decoding speed and latency issues in speech recognition pipelines for users of CTC models, representing an incremental improvement in optimization.

The paper tackles the performance bottleneck of CPU-based beam search decoding in CTC-based speech recognition by introducing a GPU-accelerated WFST decoder, achieving up to 7 times higher throughput and nearly 8 times lower latency with similar or better word error rates.

While Connectionist Temporal Classification (CTC) models deliver state-of-the-art accuracy in automated speech recognition (ASR) pipelines, their performance has been limited by CPU-based beam search decoding. We introduce a GPU-accelerated Weighted Finite State Transducer (WFST) beam search decoder compatible with current CTC models. It increases pipeline throughput and decreases latency, supports streaming inference, and also supports advanced features like utterance-specific word boosting via on-the-fly composition. We provide pre-built DLPack-based python bindings for ease of use with Python-based machine learning frameworks at https://github.com/nvidia-riva/riva-asrlib-decoder. We evaluated our decoder for offline and online scenarios, demonstrating that it is the fastest beam search decoder for CTC models. In the offline scenario it achieves up to 7 times more throughput than the current state-of-the-art CPU decoder and in the online streaming scenario, it achieves nearly 8 times lower latency, with same or better word error rate.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes