CLAIJun 2, 2021

Attention-based Contextual Language Model Adaptation for Speech Recognition

arXiv:2106.01451v1713 citations
Originality Incremental advance
AI Analysis

This addresses the need for better contextual adaptation in voice assistants, though it is incremental over existing methods.

The paper tackles the problem of incorporating non-linguistic contextual information into language models for speech recognition, achieving a 7.0% relative reduction in perplexity on a voice assistant dataset and a 9.0% improvement on long-tail utterances.

Language modeling (LM) for automatic speech recognition (ASR) does not usually incorporate utterance level contextual information. For some domains like voice assistants, however, additional context, such as the time at which an utterance was spoken, provides a rich input signal. We introduce an attention mechanism for training neural speech recognition language models on both text and non-linguistic contextual data. When applied to a large de-identified dataset of utterances collected by a popular voice assistant platform, our method reduces perplexity by 7.0% relative over a standard LM that does not incorporate contextual information. When evaluated on utterances extracted from the long tail of the dataset, our method improves perplexity by 9.0% relative over a standard LM and by over 2.8% relative when compared to a state-of-the-art model for contextual LM.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes