ASAICLLGSDMar 18, 2023

Powerful and Extensible WFST Framework for RNN-Transducer Losses

NVIDIA
arXiv:2303.10384v17 citationsh-index: 32
Originality Incremental advance
AI Analysis

This work addresses the problem of inflexible RNN-T implementations for speech recognition researchers and developers, offering an extensible framework, though it is incremental as it builds on existing WFST and RNN-T concepts.

The paper tackles the difficulty of extending and debugging RNN-Transducer (RNN-T) losses by introducing a WFST-based framework, resulting in two implementations (Compose-Transducer and Grid-Transducer) that are computationally competitive and efficient, and a new W-Transducer loss that outperforms standard RNN-T in weakly-supervised setups with missing transcriptions.

This paper presents a framework based on Weighted Finite-State Transducers (WFST) to simplify the development of modifications for RNN-Transducer (RNN-T) loss. Existing implementations of RNN-T use CUDA-related code, which is hard to extend and debug. WFSTs are easy to construct and extend, and allow debugging through visualization. We introduce two WFST-powered RNN-T implementations: (1) "Compose-Transducer", based on a composition of the WFST graphs from acoustic and textual schema -- computationally competitive and easy to modify; (2) "Grid-Transducer", which constructs the lattice directly for further computations -- most compact, and computationally efficient. We illustrate the ease of extensibility through introduction of a new W-Transducer loss -- the adaptation of the Connectionist Temporal Classification with Wild Cards. W-Transducer (W-RNNT) consistently outperforms the standard RNN-T in a weakly-supervised data setup with missing parts of transcriptions at the beginning and end of utterances. All RNN-T losses are implemented with the k2 framework and are available in the NeMo toolkit.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes