CLSDASJan 7, 2022

Textual Data Augmentation for Arabic-English Code-Switching Speech Recognition

arXiv:2201.02550v215 citations
AI Analysis

This work addresses the challenge of handling mixed-language speech for Arabic-English speakers, but it is incremental as it builds on existing methods like lexical replacement and equivalence constraints.

The paper tackled the problem of data scarcity in Arabic-English code-switching speech recognition by proposing a zero-shot learning method that augments monolingual data with artificially generated code-switching text, resulting in a 65.5% relative reduction in language model perplexity and a 7.7% reduction in ASR word error rate on test sets.

The pervasiveness of intra-utterance code-switching (CS) in spoken content requires that speech recognition (ASR) systems handle mixed language. Designing a CS-ASR system has many challenges, mainly due to data scarcity, grammatical structure complexity, and domain mismatch. The most common method for addressing CS is to train an ASR system with the available transcribed CS speech, along with monolingual data. In this work, we propose a zero-shot learning methodology for CS-ASR by augmenting the monolingual data with artificially generating CS text. We based our approach on random lexical replacements and Equivalence Constraint (EC) while exploiting aligned translation pairs to generate random and grammatically valid CS content. Our empirical results show a 65.5% relative reduction in language model perplexity, and 7.7% in ASR WER on two ecologically valid CS test sets. The human evaluation of the generated text using EC suggests that more than 80% is of adequate quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes