ASAICLLGSDDec 7, 2022

Lattice-Free Sequence Discriminative Training for Phoneme-Based Neural Transducers

arXiv:2212.04325v34 citationsh-index: 104
Originality Incremental advance
AI Analysis

This work addresses efficiency and performance bottlenecks in neural transducer training for speech recognition, offering incremental improvements over existing methods.

The paper tackled the lack of lattice-free sequence discriminative training methods in RNN-Transducers for automatic speech recognition, proposing three such objectives that achieved up to 6.5% relative improvement in word error rate and 40%-70% training time speedup with minimal performance degradation.

Recently, RNN-Transducers have achieved remarkable results on various automatic speech recognition tasks. However, lattice-free sequence discriminative training methods, which obtain superior performance in hybrid models, are rarely investigated in RNN-Transducers. In this work, we propose three lattice-free training objectives, namely lattice-free maximum mutual information, lattice-free segment-level minimum Bayes risk, and lattice-free minimum Bayes risk, which are used for the final posterior output of the phoneme-based neural transducer with a limited context dependency. Compared to criteria using N-best lists, lattice-free methods eliminate the decoding step for hypotheses generation during training, which leads to more efficient training. Experimental results show that lattice-free methods gain up to 6.5% relative improvement in word error rate compared to a sequence-level cross-entropy trained model. Compared to the N-best-list based minimum Bayes risk objectives, lattice-free methods gain 40% - 70% relative training time speedup with a small degradation in performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes