ASLGSDJun 4, 2024

Keyword-Guided Adaptation of Automatic Speech Recognition

arXiv:2406.02649v15 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of accurate jargon recognition in ASR for noisy and specialized domains, representing an incremental improvement over existing models.

The paper tackled the problem of recognizing specialized jargon in noisy environments by proposing a keyword-guided adaptation method for Whisper-based ASR models, resulting in an average 5.1% improvement in word error rate over Whisper in unseen language generalization.

Automatic Speech Recognition (ASR) technology has made significant progress in recent years, providing accurate transcription across various domains. However, some challenges remain, especially in noisy environments and specialized jargon. In this paper, we propose a novel approach for improved jargon word recognition by contextual biasing Whisper-based models. We employ a keyword spotting model that leverages the Whisper encoder representation to dynamically generate prompts for guiding the decoder during the transcription process. We introduce two approaches to effectively steer the decoder towards these prompts: KG-Whisper, which is aimed at fine-tuning the Whisper decoder, and KG-Whisper-PT, which learns a prompt prefix. Our results show a significant improvement in the recognition accuracy of specified keywords and in reducing the overall word error rates. Specifically, in unseen language generalization, we demonstrate an average WER improvement of 5.1% over Whisper.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes