CLAILGASOct 17, 2024

Parameter-efficient Adaptation of Multilingual Multimodal Models for Low-resource ASR

arXiv:2410.13445v122 citationsh-index: 22MRL
AI Analysis

This work addresses ASR for low-resource languages, but it is incremental as it combines existing techniques.

The paper tackled the challenge of low-resource automatic speech recognition by combining parameter-efficient fine-tuning and text-only adaptation using a multilingual multimodal model, achieving up to a 17% relative WER reduction in zero-shot settings.

Automatic speech recognition (ASR) for low-resource languages remains a challenge due to the scarcity of labeled training data. Parameter-efficient fine-tuning and text-only adaptation are two popular methods that have been used to address such low-resource settings. In this work, we investigate how these techniques can be effectively combined using a multilingual multimodal model like SeamlessM4T. Multimodal models are able to leverage unlabeled text via text-only adaptation with further parameter-efficient ASR fine-tuning, thus boosting ASR performance. We also show cross-lingual transfer from a high-resource language, achieving up to a relative 17% WER reduction over a baseline in a zero-shot setting without any labeled speech.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes