CLSDASJun 2, 2023

Can Contextual Biasing Remain Effective with Whisper and GPT-2?

arXiv:2306.01942v129 citationsh-index: 64
Originality Incremental advance
AI Analysis

This addresses the issue of domain-specific word recognition for users of large-scale ASR and language models, but it is incremental as it builds on existing biasing techniques.

The paper tackled the problem of poor ASR performance for infrequent content words in Whisper and GPT-2 models by proposing a neural contextual biasing method, resulting in a considerable reduction in errors on biasing words with a 1000-word list across three datasets.

End-to-end automatic speech recognition (ASR) and large language models, such as Whisper and GPT-2, have recently been scaled to use vast amounts of training data. Despite the large amount of training data, infrequent content words that occur in a particular task may still exhibit poor ASR performance, with contextual biasing a possible remedy. This paper investigates the effectiveness of neural contextual biasing for Whisper combined with GPT-2. Specifically, this paper proposes integrating an adapted tree-constrained pointer generator (TCPGen) component for Whisper and a dedicated training scheme to dynamically adjust the final output without modifying any Whisper model parameters. Experiments across three datasets show a considerable reduction in errors on biasing words with a biasing list of 1000 words. Contextual biasing was more effective when applied to domain-specific data and can boost the performance of Whisper and GPT-2 without losing their generality.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes