CLIRSDASNov 14, 2023

Retrieve and Copy: Scaling ASR Personalization to Large Catalogs

MILA
arXiv:2311.08402v1134 citationsh-index: 17
Originality Incremental advance
AI Analysis

This addresses the limitation of existing ASR personalization methods that are restricted to small catalogs, enabling broader real-world usability for applications requiring recognition of rare words and domain-specific entities.

The paper tackles the problem of scaling automatic speech recognition personalization to large catalogs by proposing a 'Retrieve and Copy' mechanism and a training strategy, achieving up to 6% more Word Error Rate reduction and 3.6% absolute F1 improvement compared to a baseline, with support for up to 20K entities and 20% inference speedup.

Personalization of automatic speech recognition (ASR) models is a widely studied topic because of its many practical applications. Most recently, attention-based contextual biasing techniques are used to improve the recognition of rare words and domain specific entities. However, due to performance constraints, the biasing is often limited to a few thousand entities, restricting real-world usability. To address this, we first propose a "Retrieve and Copy" mechanism to improve latency while retaining the accuracy even when scaled to a large catalog. We also propose a training strategy to overcome the degradation in recall at such scale due to an increased number of confusing entities. Overall, our approach achieves up to 6% more Word Error Rate reduction (WERR) and 3.6% absolute improvement in F1 when compared to a strong baseline. Our method also allows for large catalog sizes of up to 20K without significantly affecting WER and F1-scores, while achieving at least 20% inference speedup per acoustic frame.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes