CL ASOct 7, 2021

Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models

Liang-Hsuan Tseng, Yu-Kuan Fu, Heng-Jui Chang, Hung-yi Lee

arXiv:2110.03504v11.819 citations

Originality Incremental advance

AI Analysis

This work addresses speech recognition for bilingual conversations, which is an incremental improvement in handling language alternation with limited transcribed data.

The paper tackled Mandarin-English code-switching speech recognition by leveraging self-supervised learning models trained on unlabeled speech data, resulting in improved performance through joint training of CTC and language identification modules, with multilingual pre-training yielding the best results.

Code-switching (CS) is common in daily conversations where more than one language is used within a sentence. The difficulties of CS speech recognition lie in alternating languages and the lack of transcribed data. Therefore, this paper uses the recently successful self-supervised learning (SSL) methods to leverage many unlabeled speech data without CS. We show that hidden representations of SSL models offer frame-level language identity even if the models are trained with English speech only. Jointly training CTC and language identification modules with self-supervised speech representations improves CS speech recognition performance. Furthermore, using multilingual speech data for pre-training obtains the best CS speech recognition.

View on arXiv PDF

Similar