RO-N3WS: Enhancing Generalization in Low-Resource ASR with Diverse Romanian Speech Benchmarks

Alexandra Diaconu, Mădălina Vînaga, Bogdan Alexe

arXiv:2603.02368v10.6h-index: 19

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of domain adaptation and OOD performance for Romanian ASR, which is incremental as it applies existing methods to new data.

The authors tackled the problem of improving generalization in low-resource Romanian automatic speech recognition by introducing RO-N3WS, a diverse speech dataset, and found that fine-tuning on it substantially reduced word error rates compared to zero-shot baselines.

We introduce RO-N3WS, a benchmark Romanian speech dataset designed to improve generalization in automatic speech recognition (ASR), particularly in low-resource and out-of-distribution (OOD) conditions. RO-N3WS comprises over 126 hours of transcribed audio collected from broadcast news, literary audiobooks, film dialogue, children's stories, and conversational podcast speech. This diversity enables robust training and fine-tuning across stylistically distinct domains. We evaluate several state-of-the-art ASR systems (Whisper, Wav2Vec 2.0) in both zero-shot and fine-tuned settings, and conduct controlled comparisons using synthetic data generated with expressive TTS models. Our results show that even limited fine-tuning on real speech from RO-N3WS yields substantial WER improvements over zero-shot baselines. We will release all models, scripts, and data splits to support reproducible research in multilingual ASR, domain adaptation, and lightweight deployment.

View on arXiv PDF

Similar