AS AI CL LG SDAug 10, 2025

FlexCTC: GPU-powered CTC Beam Decoding With Advanced Contextual Abilities

Lilit Grigoryan, Vladimir Bataev, Nikolay Karpov, Andrei Andrusenko, Vitaly Lavrukhin, Boris Ginsburg

NVIDIA

arXiv:2508.07315v24.33 citationsh-index: 17Has Code

Originality Incremental advance

AI Analysis

This provides a faster, more efficient alternative to existing decoders for speech recognition researchers and practitioners, though it is incremental as it builds on established CTC and beam search methods.

The paper tackles the slow and CPU-bound nature of standard beam search for CTC-based speech recognition by introducing FlexCTC, a fully GPU-based toolkit that eliminates CPU-GPU synchronization and supports advanced contextualization, achieving fast and efficient decoding suitable for research and production.

While beam search improves speech recognition quality over greedy decoding, standard implementations are slow, often sequential, and CPU-bound. To fully leverage modern hardware capabilities, we present a novel open-source FlexCTC toolkit for fully GPU-based beam decoding, designed for Connectionist Temporal Classification (CTC) models. Developed entirely in Python and PyTorch, it offers a fast, user-friendly, and extensible alternative to traditional C++, CUDA, or WFST-based decoders. The toolkit features a high-performance, fully batched GPU implementation with eliminated CPU-GPU synchronization and minimized kernel launch overhead via CUDA Graphs. It also supports advanced contextualization techniques, including GPU-powered N-gram language model fusion and phrase-level boosting. These features enable accurate and efficient decoding, making them suitable for both research and production use.

View on arXiv PDF

Similar