CLASJul 18, 2025

Optimizing ASR for Catalan-Spanish Code-Switching: A Comparative Analysis of Methodologies

arXiv:2507.13875v1h-index: 9INTERSPEECH
Originality Synthesis-oriented
AI Analysis

This work addresses a domain-specific problem for multilingual societies where code-switching occurs, but it is incremental as it builds on existing methods with new data.

The paper tackled the challenge of automatic speech recognition for Catalan-Spanish code-switching by exploring strategies like synthetic data generation and fine-tuning Whisper models, achieving improved transcription performance with a combination of synthetic data and language tokens.

Code-switching (CS), the alternating use of two or more languages, challenges automatic speech recognition (ASR) due to scarce training data and linguistic similarities. The lack of dedicated CS datasets limits ASR performance, as most models rely on monolingual or mixed-language corpora that fail to reflect real-world CS patterns. This issue is critical in multilingual societies where CS occurs in informal and formal settings. A key example is Catalan-Spanish CS, widely used in media and parliamentary speeches. In this work, we improve ASR for Catalan-Spanish CS by exploring three strategies: (1) generating synthetic CS data, (2) concatenating monolingual audio, and (3) leveraging real CS data with language tokens. We extract CS data from Catalan speech corpora and fine-tune OpenAI's Whisper models, making them available on Hugging Face. Results show that combining a modest amount of synthetic CS data with the dominant language token yields the best transcription performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes