CLASSep 27, 2021

Factorized Neural Transducer for Efficient Language Model Adaptation

arXiv:2110.01500v546 citations
Originality Incremental advance
AI Analysis

This work addresses a practical limitation in streaming end-to-end ASR systems, allowing for more flexible language model adaptation, but it is incremental as it builds on existing neural Transducer frameworks.

The paper tackles the challenge of language model adaptation in neural Transducer-based ASR systems by proposing a factorized neural Transducer that separates blank and vocabulary predictions, enabling the use of standalone language models. This approach yields 15% to 20% WER improvements with out-of-domain text data for adaptation, though with a minor degradation on general test sets.

In recent years, end-to-end (E2E) based automatic speech recognition (ASR) systems have achieved great success due to their simplicity and promising performance. Neural Transducer based models are increasingly popular in streaming E2E based ASR systems and have been reported to outperform the traditional hybrid system in some scenarios. However, the joint optimization of acoustic model, lexicon and language model in neural Transducer also brings about challenges to utilize pure text for language model adaptation. This drawback might prevent their potential applications in practice. In order to address this issue, in this paper, we propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction, and adopting a standalone language model for the vocabulary prediction. It is expected that this factorization can transfer the improvement of the standalone language model to the Transducer for speech recognition, which allows various language model adaptation techniques to be applied. We demonstrate that the proposed factorized neural Transducer yields 15% to 20% WER improvements when out-of-domain text data is used for language model adaptation, at the cost of a minor degradation in WER on a general test set.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes