ASCLSDSep 16, 2024

SMILE: Speech Meta In-Context Learning for Low-Resource Language Automatic Speech Recognition

arXiv:2409.10429v23 citationsh-index: 5
AI Analysis

This addresses the challenge of adapting ASR models to low-resource languages without fine-tuning, benefiting applications in multilingual speech processing.

The paper tackles the problem of low-resource language automatic speech recognition by introducing SMILE, a framework that combines meta-learning with speech in-context learning, which significantly reduces character and word error rates in few-shot multilingual ASR tasks on the ML-SUPERB benchmark.

Automatic Speech Recognition (ASR) models demonstrate outstanding performance on high-resource languages but face significant challenges when applied to low-resource languages due to limited training data and insufficient cross-lingual generalization. Existing adaptation strategies, such as shallow fusion, data augmentation, and direct fine-tuning, either rely on external resources, suffer computational inefficiencies, or fail in test-time adaptation scenarios. To address these limitations, we introduce Speech Meta In-Context LEarning (SMILE), an innovative framework that combines meta-learning with speech in-context learning (SICL). SMILE leverages meta-training from high-resource languages to enable robust, few-shot generalization to low-resource languages without explicit fine-tuning on the target domain. Extensive experiments on the ML-SUPERB benchmark show that SMILE consistently outperforms baseline methods, significantly reducing character and word error rates in training-free few-shot multilingual ASR tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes