CLSDASDec 12, 2023

Self-supervised Adaptive Pre-training of Multilingual Speech Models for Language and Dialect Identification

Peking U
arXiv:2312.07338v113 citationsh-index: 19ICASSP
Originality Incremental advance
AI Analysis

This addresses the problem of domain mismatch for researchers and practitioners in multilingual speech processing, offering an incremental improvement over existing pre-training methods.

The paper tackles domain mismatch in multilingual speech models by proposing self-supervised adaptive pre-training (SAPT), which improves the XLSR-128 model's performance on language identification tasks, achieving up to 40.1% gains for under-represented languages on the FLEURS benchmark and enhancing sample efficiency in few-shot settings.

Pre-trained Transformer-based speech models have shown striking performance when fine-tuned on various downstream tasks such as automatic speech recognition and spoken language identification (SLID). However, the problem of domain mismatch remains a challenge in this area, where the domain of the pre-training data might differ from that of the downstream labeled data used for fine-tuning. In multilingual tasks such as SLID, the pre-trained speech model may not support all the languages in the downstream task. To address this challenge, we propose self-supervised adaptive pre-training (SAPT) to adapt the pre-trained model to the target domain and languages of the downstream task. We apply SAPT to the XLSR-128 model and investigate the effectiveness of this approach for the SLID task. First, we demonstrate that SAPT improves XLSR performance on the FLEURS benchmark with substantial gains up to 40.1% for under-represented languages. Second, we apply SAPT on four different datasets in a few-shot learning setting, showing that our approach improves the sample efficiency of XLSR during fine-tuning. Our experiments provide strong empirical evidence that continual adaptation via self-supervision improves downstream performance for multilingual speech models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes