AICLSep 18, 2023

A Multitask Training Approach to Enhance Whisper with Contextual Biasing and Open-Vocabulary Keyword Spotting

arXiv:2309.09552v410 citationsh-index: 15
Originality Incremental advance
AI Analysis

This work addresses the problem of improving ASR accuracy for rare entities, which is important for applications like voice assistants, but it is incremental as it builds on the existing Whisper model.

The paper tackles the challenge of recognizing rare named entities in automatic speech recognition by introducing KWS-Whisper, which enhances Whisper with open-vocabulary keyword spotting and contextual biasing, resulting in significant improvements in entity recall on Chinese Aishell hot word subsets and internal code-switching test sets.

The recognition of rare named entities, such as personal names and terminologies, is challenging for automatic speech recognition (ASR) systems, especially when they are not frequently observed in the training data. In this paper, we introduce keyword spotting enhanced Whisper (KWS-Whisper), a novel ASR system that leverages the Whisper model and performs open-vocabulary keyword spotting (OV-KWS) on the hidden states of the Whisper encoder to recognize user-defined named entities. These entities serve as prompts for the Whisper decoder. To optimize the model, we propose a multitask training approach that learns OV-KWS and contextual-ASR tasks. We evaluate our approach on Chinese Aishell hot word subsets and two internal code-switching test sets and show that it significantly improves the entity recall compared to the original Whisper model. Moreover, we demonstrate that the OV-KWS can be a plug-and-play module to enhance the ASR error correction methods and frozen Whisper models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes