CLSDASMay 22, 2025

From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition

arXiv:2505.16972v12 citationsh-index: 4EMNLP
Originality Incremental advance
AI Analysis

This addresses the problem of resource scarcity for multilingual ASR systems, offering a scalable solution that is incremental by building on existing TTS and ASR methods.

The paper tackled the challenge of limited speech data for multilingual automatic speech recognition by introducing Speech Back-Translation, a pipeline that converts large text corpora into synthetic speech using text-to-speech models, resulting in over 500,000 hours of synthetic speech across ten languages and reducing transcription errors by over 30% when pre-training Whisper-large-v3.

Recent advances in Automatic Speech Recognition (ASR) have been largely fueled by massive speech corpora. However, extending coverage to diverse languages with limited resources remains a formidable challenge. This paper introduces Speech Back-Translation, a scalable pipeline that improves multilingual ASR models by converting large-scale text corpora into synthetic speech via off-the-shelf text-to-speech (TTS) models. We demonstrate that just tens of hours of real transcribed speech can effectively train TTS models to generate synthetic speech at hundreds of times the original volume while maintaining high quality. To evaluate synthetic speech quality, we develop an intelligibility-based assessment framework and establish clear thresholds for when synthetic data benefits ASR training. Using Speech Back-Translation, we generate more than 500,000 hours of synthetic speech in ten languages and continue pre-training Whisper-large-v3, achieving average transcription error reductions of over 30\%. These results highlight the scalability and effectiveness of Speech Back-Translation for enhancing multilingual ASR systems.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes