ASCLSDSep 29, 2023

Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR Customization

arXiv:2309.17267v1h-index: 1Has Code
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of improving ASR accuracy for diverse rare terms, such as proper names, but is incremental as it builds on existing customization methods with a new dataset.

The paper tackles the problem of customizing automatic speech recognition (ASR) for rare and out-of-vocabulary phrases by creating a large-scale synthetic dataset, resulting in decreased word error rate (WER) and fewer false alarms through the injection of hard negative biasing phrases.

We present a first large-scale public synthetic dataset for contextual spellchecking customization of automatic speech recognition (ASR) with focus on diverse rare and out-of-vocabulary (OOV) phrases, such as proper names or terms. The proposed approach allows creating millions of realistic examples of corrupted ASR hypotheses and simulate non-trivial biasing lists for the customization task. Furthermore, we propose injecting two types of ``hard negatives" to the simulated biasing lists in training examples and describe our procedures to automatically mine them. We report experiments with training an open-source customization model on the proposed dataset and show that the injection of hard negative biasing phrases decreases WER and the number of false alarms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes