CLSep 18, 2025

Frustratingly Easy Data Augmentation for Low-Resource ASR

arXiv:2509.15373v21 citations
Originality Incremental advance
AI Analysis

This addresses the problem of data scarcity in ASR for low-resource languages, offering incremental improvements through straightforward augmentation techniques.

The paper tackles low-resource automatic speech recognition by introducing three data augmentation methods that generate synthetic audio from text, applied to four languages with limited data, resulting in significant performance gains such as a 14.3% absolute WER reduction for Nashta.

This paper introduces three self-contained data augmentation methods for low-resource Automatic Speech Recognition (ASR). Our techniques first generate novel text--using gloss-based replacement, random replacement, or an LLM-based approach--and then apply Text-to-Speech (TTS) to produce synthetic audio. We apply these methods, which leverage only the original annotated data, to four languages with extremely limited resources (Vatlongos, Nashta, Shinekhen Buryat, and Kakabe). Fine-tuning a pretrained Wav2Vec2-XLSR-53 model on a combination of the original audio and generated synthetic data yields significant performance gains, including a 14.3% absolute WER reduction for Nashta. The methods prove effective across all four low-resource languages and also show utility for high-resource languages like English, demonstrating their broad applicability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes