CLSDASMar 2, 2022

Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems

arXiv:2203.00888v225 citationsh-index: 57
Originality Incremental advance
AI Analysis

This addresses the challenge of customizing ASR systems for better accuracy on domain-specific terms, though it appears incremental as it builds on existing biasing approaches.

The paper tackles the problem of contextual biasing in end-to-end automatic speech recognition (ASR) systems by introducing a contextual spelling correction model that incorporates contextual information to improve recognition of specific phrases like names or proper nouns, achieving up to 51% relative word error rate reduction and outperforming traditional methods.

Contextual biasing is an important and challenging task for end-to-end automatic speech recognition (ASR) systems, which aims to achieve better recognition performance by biasing the ASR system to particular context phrases such as person names, music list, proper nouns, etc. Existing methods mainly include contextual LM biasing and adding bias encoder into end-to-end ASR models. In this work, we introduce a novel approach to do contextual biasing by adding a contextual spelling correction model on top of the end-to-end ASR system. We incorporate contextual information into a sequence-to-sequence spelling correction model with a shared context encoder. Our proposed model includes two different mechanisms: autoregressive (AR) and non-autoregressive (NAR). We propose filtering algorithms to handle large-size context lists, and performance balancing mechanisms to control the biasing degree of the model. We demonstrate the proposed model is a general biasing solution which is domain-insensitive and can be adopted in different scenarios. Experiments show that the proposed method achieves as much as 51% relative word error rate (WER) reduction over ASR system and outperforms traditional biasing methods. Compared to the AR solution, the proposed NAR model reduces model size by 43.2% and speeds up inference by 2.1 times.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes