CLLGJan 9, 2024

Continuously Learning New Words in Automatic Speech Recognition

arXiv:2401.04482v42 citationsh-index: 6ICASSP
Originality Incremental advance
AI Analysis

This addresses the challenge of recognizing domain-specific and rare words in ASR for applications like lecture transcription, though it is incremental as it builds on existing memory-enhanced models.

The paper tackles the problem of recognizing rare words like acronyms and named entities in automatic speech recognition by proposing a self-supervised continual learning approach that uses lecture slides to bias the model, achieving over 80% recall for frequently occurring new words while maintaining general performance.

Despite recent advances, Automatic Speech Recognition (ASR) systems are still far from perfect. Typical errors include acronyms, named entities, and domain-specific special words for which little or no labeled data is available. To address the problem of recognizing these words, we propose a self-supervised continual learning approach: Given the audio of a lecture talk with the corresponding slides, we bias the model towards decoding new words from the slides by using a memory-enhanced ASR model from the literature. Then, we perform inference on the talk, collecting utterances that contain detected new words into an adaptation data set. Continual learning is then performed by training adaptation weights added to the model on this data set. The whole procedure is iterated for many talks. We show that with this approach, we obtain increasing performance on the new words when they occur more frequently (more than 80% recall) while preserving the general performance of the model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes