CLSDASJun 17, 2025

AsyncSwitch: Asynchronous Text-Speech Adaptation for Code-Switched ASR

arXiv:2506.14190v11 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses the problem of limited code-switched speech data for ASR developers, offering an incremental improvement over prior synthetic audio methods.

The paper tackles the challenge of developing code-switched ASR systems by introducing AsyncSwitch, an asynchronous adaptation framework that uses large-scale text data to pre-expose models before fine-tuning on speech-text data, achieving a 9.02% relative WER reduction on Malay-English code-switching.

Developing code-switched ASR systems is challenging due to language ambiguity and limited exposure to multilingual, code-switched data, while collecting such speech is costly. Prior work generates synthetic audio from text, but these methods are computationally intensive and hard to scale. We introduce AsyncSwitch, a novel asynchronous adaptation framework that leverages large-scale, text-rich web data to pre-expose ASR models to diverse code-switched domains before fine-tuning on paired speech-text corpora. Our three-stage process (1) trains decoder self-attention and feedforward layers on code-switched text, (2) aligns decoder and encoder via cross-attention using limited speech-text data, and (3) fully fine-tunes the entire model. Experiments with Whisper on Malay-English code-switching demonstrate a 9.02% relative WER reduction, while improving monolingual performance in Singlish, Malay, and other English variants.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes