CLSDASJun 2, 2023

DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model

arXiv:2306.01303v12 citationsh-index: 18
Originality Incremental advance
AI Analysis

This work addresses the need for lightweight models in speech recognition for low-resource languages, though it is incremental as it builds on existing distillation and cross-lingual techniques.

The paper tackles the problem of compressing large multilingual self-supervised speech representation models for industrial use by proposing DistilXLSR, a distilled cross-lingual model that reduces parameters by 50% while maintaining representation ability across 15 low-resource languages.

Multilingual self-supervised speech representation models have greatly enhanced the speech recognition performance for low-resource languages, and the compression of these huge models has also become a crucial prerequisite for their industrial application. In this paper, we propose DistilXLSR, a distilled cross-lingual speech representation model. By randomly shuffling the phonemes of existing speech, we reduce the linguistic information and distill cross-lingual models using only English data. We also design a layer-jumping initialization method to fully leverage the teacher's pre-trained weights. Experiments on 2 kinds of teacher models and 15 low-resource languages show that our method can reduce the parameters by 50% while maintaining cross-lingual representation ability. Our method is proven to be generalizable to various languages/teacher models and has the potential to improve the cross-lingual performance of the English pre-trained models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes