SD CL ASOct 21, 2022

Optimizing Bilingual Neural Transducer with Synthetic Code-switching Text Generation

Thien Nguyen, Nathalie Tran, Liuhui Deng, Thiago Fraga da Silva, Matthew Radzihovsky, Roger Hsiao, Henry Mason, Stefan Braun, Erik McDermott, Dogan Can, Pawel Swietojanski, Lyan Verwimp

arXiv:2210.12214v18.36 citationsh-index: 27

Originality Synthesis-oriented

AI Analysis

This work addresses code-switching speech recognition for bilingual ASR systems, representing an incremental improvement in a domain-specific task.

The study tackled optimizing a bilingual neural transducer for code-switching speech recognition without supervised data, achieving a 25% mixed error rate on the ASCEND dataset, a 2.1% absolute reduction compared to prior work.

Code-switching describes the practice of using more than one language in the same sentence. In this study, we investigate how to optimize a neural transducer based bilingual automatic speech recognition (ASR) model for code-switching speech. Focusing on the scenario where the ASR model is trained without supervised code-switching data, we found that semi-supervised training and synthetic code-switched data can improve the bilingual ASR system on code-switching speech. We analyze how each of the neural transducer's encoders contributes towards code-switching performance by measuring encoder-specific recall values, and evaluate our English/Mandarin system on the ASCEND data set. Our final system achieves 25% mixed error rate (MER) on the ASCEND English/Mandarin code-switching test set -- reducing the MER by 2.1% absolute compared to the previous literature -- while maintaining good accuracy on the monolingual test sets.

View on arXiv PDF

Similar