Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR Customization
This addresses the challenge of improving ASR accuracy for diverse rare terms, such as proper names, but is incremental as it builds on existing customization methods with a new dataset.
The paper tackles the problem of customizing automatic speech recognition (ASR) for rare and out-of-vocabulary phrases by creating a large-scale synthetic dataset, resulting in decreased word error rate (WER) and fewer false alarms through the injection of hard negative biasing phrases.
We present a first large-scale public synthetic dataset for contextual spellchecking customization of automatic speech recognition (ASR) with focus on diverse rare and out-of-vocabulary (OOV) phrases, such as proper names or terms. The proposed approach allows creating millions of realistic examples of corrupted ASR hypotheses and simulate non-trivial biasing lists for the customization task. Furthermore, we propose injecting two types of ``hard negatives" to the simulated biasing lists in training examples and describe our procedures to automatically mine them. We report experiments with training an open-source customization model on the proposed dataset and show that the injection of hard negative biasing phrases decreases WER and the number of false alarms.