AS CL SDSep 29, 2023

Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR Customization

arXiv:2309.17267v11.2h-index: 1Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of improving ASR accuracy for diverse rare terms, such as proper names, but is incremental as it builds on existing customization methods with a new dataset.

The paper tackles the problem of customizing automatic speech recognition (ASR) for rare and out-of-vocabulary phrases by creating a large-scale synthetic dataset, resulting in decreased word error rate (WER) and fewer false alarms through the injection of hard negative biasing phrases.

We present a first large-scale public synthetic dataset for contextual spellchecking customization of automatic speech recognition (ASR) with focus on diverse rare and out-of-vocabulary (OOV) phrases, such as proper names or terms. The proposed approach allows creating millions of realistic examples of corrupted ASR hypotheses and simulate non-trivial biasing lists for the customization task. Furthermore, we propose injecting two types of ``hard negatives" to the simulated biasing lists in training examples and describe our procedures to automatically mine them. We report experiments with training an open-source customization model on the proposed dataset and show that the injection of hard negative biasing phrases decreases WER and the number of false alarms.

View on arXiv PDF

Similar