LG SD ASOct 10, 2025

FLToP CTC: Frame-Level Token Pruning via Relative Threshold for Efficient and Memory-Saving Decoding on Diverse Platforms

arXiv:2510.09085v1h-index: 2

Originality Incremental advance

AI Analysis

It addresses bottlenecks in CTC decoders for resource-limited environments, enhancing efficiency and accessibility in speech recognition, though it is incremental as it builds on existing CTC methods.

This paper tackles the computational and memory inefficiencies in CTC-based ASR systems by introducing FLToP CTC, a decoding algorithm that uses frame-level token pruning with a relative threshold, achieving a 10.5x runtime speedup and 2.78x memory reduction on LibriSpeech with negligible WER degradation.

CTC-based ASR systems face computational and memory bottlenecks in resource-limited environments. Traditional CTC decoders, requiring up to 90% of processing time in systems (e.g., wav2vec2-large on L4 GPUs), face inefficiencies due to exhaustive token-level operations. This paper introduces Frame Level Token Pruning for Connectionist Temporal Classification (FLToP CTC), a novel decoding algorithm that employs frame-level token pruning guided by a relative threshold probability. By dynamically eliminating low-probability tokens per frame, FLToP CTC reduces compute and memory demands while maintaining negligible WER degradation. On LibriSpeech, FLToP CTC achieves a 10.5x runtime speedup and 2.78x memory reduction versus standard CTC decoders. Its simplicity enables seamless integration into CTC decoders across platforms (CPUs, GPUs, etc.). FLToP CTC addresses CTC bottlenecks, offering scalability for resource-limited environments and realtime applications, enhancing speech recognition accessibility and efficiency.

View on arXiv PDF

Similar