CL SD ASNov 29, 2021

Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization

Brian Yan, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe, Dong Yu

arXiv:2111.15016v12.625 citationsh-index: 84

Originality Incremental advance

AI Analysis

This addresses the challenge of accurately recognizing conversational bilingual speech, which includes code-switching, for applications in multilingual environments.

The paper tackled the problem of bilingual speech recognition by jointly modeling monolingual and code-switched utterances using a conditionally factorized framework, achieving improved performance on Mandarin-English datasets.

Conversational bilingual speech encompasses three types of utterances: two purely monolingual types and one intra-sententially code-switched type. In this work, we propose a general framework to jointly model the likelihoods of the monolingual and code-switch sub-tasks that comprise bilingual speech recognition. By defining the monolingual sub-tasks with label-to-frame synchronization, our joint modeling framework can be conditionally factorized such that the final bilingual output, which may or may not be code-switched, is obtained given only monolingual information. We show that this conditionally factorized joint framework can be modeled by an end-to-end differentiable neural network. We demonstrate the efficacy of our proposed model on bilingual Mandarin-English speech recognition across both monolingual and code-switched corpora.

View on arXiv PDF

Similar