SDAILGNEASJan 19, 2023

From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition

NVIDIA
arXiv:2301.07851v134 citationsh-index: 69
Originality Incremental advance
AI Analysis

This work addresses the problem of adapting speech recognition models to multiple languages efficiently for researchers and practitioners, though it is incremental as it builds on existing model reprogramming and ASR techniques.

The paper tackles cross-lingual speech recognition by proposing a parameter-efficient model reprogramming framework that repurposes English ASR models for other languages, achieving competitive word error rates (WER) of 8.1% to 11.9% while using only 4.2% to 6.8% of the original trainable parameters.

In this work, we propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition, which can \textbf{re-purpose} well-trained English automatic speech recognition (ASR) models to recognize the other languages. We design different auxiliary neural architectures focusing on learnable pre-trained feature enhancement that, for the first time, empowers model reprogramming on ASR. Specifically, we investigate how to select trainable components (i.e., encoder) of a conformer-based RNN-Transducer, as a frozen pre-trained backbone. Experiments on a seven-language multilingual LibriSpeech speech (MLS) task show that model reprogramming only requires 4.2% (11M out of 270M) to 6.8% (45M out of 660M) of its original trainable parameters from a full ASR model to perform competitive results in a range of 11.9% to 8.1% WER averaged across different languages. In addition, we discover different setups to make large-scale pre-trained ASR succeed in both monolingual and multilingual speech recognition. Our methods outperform existing ASR tuning architectures and their extension with self-supervised losses (e.g., w2v-bert) in terms of lower WER and better training efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes