SD CL LG ASSep 27, 2023

Speech collage: code-switched audio generation by collaging monolingual corpora

Amir Hussein, Dorsa Zeinali, Ondřej Klejch, Matthew Wiesner, Brian Yan, Shammur Chowdhury, Ahmed Ali, Shinji Watanabe, Sanjeev Khudanpur

arXiv:2309.15674v112.412 citationsh-index: 83Has Code

Originality Incremental advance

AI Analysis

This addresses data scarcity for researchers and developers building automatic speech recognition systems for code-switching, though it is incremental as it builds on existing monolingual data and splicing methods.

The paper tackles the problem of data scarcity for code-switched automatic speech recognition by introducing Speech Collage, a method that synthesizes code-switched audio from monolingual corpora through splicing and overlap-add techniques, resulting in up to 34.4% and 16.2% relative reductions in error rates for in-domain and zero-shot scenarios, respectively.

Designing effective automatic speech recognition (ASR) systems for Code-Switching (CS) often depends on the availability of the transcribed CS resources. To address data scarcity, this paper introduces Speech Collage, a method that synthesizes CS data from monolingual corpora by splicing audio segments. We further improve the smoothness quality of audio generation using an overlap-add approach. We investigate the impact of generated data on speech recognition in two scenarios: using in-domain CS text and a zero-shot approach with synthesized CS text. Empirical results highlight up to 34.4% and 16.2% relative reductions in Mixed-Error Rate and Word-Error Rate for in-domain and zero-shot scenarios, respectively. Lastly, we demonstrate that CS augmentation bolsters the model's code-switching inclination and reduces its monolingual bias.

View on arXiv PDF Code

Similar