CLASOct 7, 2021

Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models

arXiv:2110.03504v119 citations
Originality Incremental advance
AI Analysis

This work addresses speech recognition for bilingual conversations, which is an incremental improvement in handling language alternation with limited transcribed data.

The paper tackled Mandarin-English code-switching speech recognition by leveraging self-supervised learning models trained on unlabeled speech data, resulting in improved performance through joint training of CTC and language identification modules, with multilingual pre-training yielding the best results.

Code-switching (CS) is common in daily conversations where more than one language is used within a sentence. The difficulties of CS speech recognition lie in alternating languages and the lack of transcribed data. Therefore, this paper uses the recently successful self-supervised learning (SSL) methods to leverage many unlabeled speech data without CS. We show that hidden representations of SSL models offer frame-level language identity even if the models are trained with English speech only. Jointly training CTC and language identification modules with self-supervised speech representations improves CS speech recognition performance. Furthermore, using multilingual speech data for pre-training obtains the best CS speech recognition.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes