CLSDASNov 29, 2021

Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization

arXiv:2111.15016v125 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of accurately recognizing conversational bilingual speech, which includes code-switching, for applications in multilingual environments.

The paper tackled the problem of bilingual speech recognition by jointly modeling monolingual and code-switched utterances using a conditionally factorized framework, achieving improved performance on Mandarin-English datasets.

Conversational bilingual speech encompasses three types of utterances: two purely monolingual types and one intra-sententially code-switched type. In this work, we propose a general framework to jointly model the likelihoods of the monolingual and code-switch sub-tasks that comprise bilingual speech recognition. By defining the monolingual sub-tasks with label-to-frame synchronization, our joint modeling framework can be conditionally factorized such that the final bilingual output, which may or may not be code-switched, is obtained given only monolingual information. We show that this conditionally factorized joint framework can be modeled by an end-to-end differentiable neural network. We demonstrate the efficacy of our proposed model on bilingual Mandarin-English speech recognition across both monolingual and code-switched corpora.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes