ASCLSDNov 2, 2022

Monolingual Recognizers Fusion for Code-switching Speech Recognition

arXiv:2211.01046v14 citationsh-index: 31Has Code
Originality Incremental advance
AI Analysis

This work addresses code-switching speech recognition, a domain-specific challenge for multilingual communication, with incremental improvements in model fusion and training efficiency.

The paper tackles the problem of code-switching speech recognition by proposing a monolingual recognizers fusion method that reduces reliance on code-switching data and improves performance. Experiments on a Mandarin-English corpus show a significant reduction in the mix error rate using open-source pre-trained models.

The bi-encoder structure has been intensively investigated in code-switching (CS) automatic speech recognition (ASR). However, most existing methods require the structures of two monolingual ASR models (MAMs) should be the same and only use the encoder of MAMs. This leads to the problem that pre-trained MAMs cannot be timely and fully used for CS ASR. In this paper, we propose a monolingual recognizers fusion method for CS ASR. It has two stages: the speech awareness (SA) stage and the language fusion (LF) stage. In the SA stage, acoustic features are mapped to two language-specific predictions by two independent MAMs. To keep the MAMs focused on their own language, we further extend the language-aware training strategy for the MAMs. In the LF stage, the BELM fuses two language-specific predictions to get the final prediction. Moreover, we propose a text simulation strategy to simplify the training process of the BELM and reduce reliance on CS data. Experiments on a Mandarin-English corpus show the efficiency of the proposed method. The mix error rate is significantly reduced on the test set after using open-source pre-trained MAMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes