CLAISDASAug 20, 2024

XCB: an effective contextual biasing approach to bias cross-lingual phrases in speech recognition

arXiv:2408.10524v1h-index: 5
Originality Incremental advance
AI Analysis

This addresses a specific problem in code-switching speech recognition for bilingual settings, representing an incremental improvement over existing contextualized ASR models.

The paper tackles the challenge of improving speech recognition for bilingual code-switching by introducing a Cross-lingual Contextual Biasing (XCB) module, which enhances recognition of secondary language phrases without extra inference overhead, as validated on an in-house dataset and the ASRU-2019 test set.

Contextualized ASR models have been demonstrated to effectively improve the recognition accuracy of uncommon phrases when a predefined phrase list is available. However, these models often struggle with bilingual settings, which are prevalent in code-switching speech recognition. In this study, we make the initial attempt to address this challenge by introducing a Cross-lingual Contextual Biasing(XCB) module. Specifically, we augment a pre-trained ASR model for the dominant language by integrating an auxiliary language biasing module and a supplementary language-specific loss, aimed at enhancing the recognition of phrases in the secondary language. Experimental results conducted on our in-house code-switching dataset have validated the efficacy of our approach, demonstrating significant improvements in the recognition of biasing phrases in the secondary language, even without any additional inference overhead. Additionally, our proposed system exhibits both efficiency and generalization when is applied by the unseen ASRU-2019 test set.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes