CLSDASJan 22, 2025

BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR

arXiv:2501.12602v14 citationsh-index: 4ICASSP
Originality Incremental advance
AI Analysis

This addresses domain-robust multilingual ASR, an incremental improvement for speech recognition systems handling multiple languages.

The paper tackled language confusion in multilingual automatic speech recognition (ASR) by proposing BLR-MoE, which extends the Mixture of Experts architecture to self-attention and improves router robustness, achieving verification on a 10,000-hour dataset.

Recently, the Mixture of Expert (MoE) architecture, such as LR-MoE, is often used to alleviate the impact of language confusion on the multilingual ASR (MASR) task. However, it still faces language confusion issues, especially in mismatched domain scenarios. In this paper, we decouple language confusion in LR-MoE into confusion in self-attention and router. To alleviate the language confusion in self-attention, based on LR-MoE, we propose to apply attention-MoE architecture for MASR. In our new architecture, MoE is utilized not only on feed-forward network (FFN) but also on self-attention. In addition, to improve the robustness of the LID-based router on language confusion, we propose expert pruning and router augmentation methods. Combining the above, we get the boosted language-routing MoE (BLR-MoE) architecture. We verify the effectiveness of the proposed BLR-MoE in a 10,000-hour MASR dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes