ASCLSDDec 7, 2022

Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information

arXiv:2212.03476v11 citationsh-index: 34
Originality Incremental advance
AI Analysis

This work addresses language interference in multilingual speech recognition systems, offering incremental improvements for ASR applications.

The paper tackles language interference in self-supervised multilingual speech pre-training by introducing auxiliary language information techniques, resulting in a 14.3% relative gain over the standard XLSR model and a 19.8% gain over a no pre-training baseline on a 16-language ASR task.

Multilingual end-to-end models have shown great improvement over monolingual systems. With the development of pre-training methods on speech, self-supervised multilingual speech representation learning like XLSR has shown success in improving the performance of multilingual automatic speech recognition (ASR). However, similar to the supervised learning, multilingual pre-training may also suffer from language interference and further affect the application of multilingual system. In this paper, we introduce several techniques for improving self-supervised multilingual pre-training by leveraging auxiliary language information, including the language adversarial training, language embedding and language adaptive training during the pre-training stage. We conduct experiments on a multilingual ASR task consisting of 16 languages. Our experimental results demonstrate 14.3% relative gain over the standard XLSR model, and 19.8% relative gain over the no pre-training multilingual model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes