CL SD ASApr 1, 2022

Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems

Takuma Udagawa, Masayuki Suzuki, Gakuto Kurata, Nobuyasu Itoh, George Saon

arXiv:2204.00212v25.231 citationsh-index: 40

Originality Incremental advance

AI Analysis

This work addresses the enhancement of near state-of-the-art ASR systems for speech recognition applications, but it is incremental as it builds on existing LLM rescoring methods.

The study tackled the problem of applying large-scale language models (LLMs) to rescore outputs from competitive ASR systems, specifically the Conformer-Transducer model, and found that consistent improvement was achieved through factors like bidirectionality and pretraining, with lexical analysis providing insights into component contributions.

Large-scale language models (LLMs) such as GPT-2, BERT and RoBERTa have been successfully applied to ASR N-best rescoring. However, whether or how they can benefit competitive, near state-of-the-art ASR systems remains unexplored. In this study, we incorporate LLM rescoring into one of the most competitive ASR baselines: the Conformer-Transducer model. We demonstrate that consistent improvement is achieved by the LLM's bidirectionality, pretraining, in-domain finetuning and context augmentation. Furthermore, our lexical analysis sheds light on how each of these components may be contributing to the ASR performance.

View on arXiv PDF

Similar