SDLGASJun 26, 2024

SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR

arXiv:2406.18021v17 citations
Originality Incremental advance
AI Analysis

This work addresses code-switching ASR for multilingual speech applications, but it appears incremental as it builds on existing MoE and Conformer architectures.

The paper tackles the problem of unified streaming and non-streaming code-switching automatic speech recognition by proposing SC-MoE, a Switch-Conformer-based mixture of experts system, which significantly improves performance over baselines with comparable computational efficiency.

In this work, we propose a Switch-Conformer-based MoE system named SC-MoE for unified streaming and non-streaming code-switching (CS) automatic speech recognition (ASR), where we design a streaming MoE layer consisting of three language experts, which correspond to Mandarin, English, and blank, respectively, and equipped with a language identification (LID) network with a Connectionist Temporal Classification (CTC) loss as a router in the encoder of SC-MoE to achieve a real-time streaming CS ASR system. To further utilize the language information embedded in text, we also incorporate MoE layers into the decoder of SC-MoE. In addition, we introduce routers into every MoE layer of the encoder and the decoder and achieve better recognition performance. Experimental results show that the SC-MoE significantly improves CS ASR performances over baseline with comparable computational efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes