CLAILGASNov 27, 2024

AMPS: ASR with Multimodal Paraphrase Supervision

arXiv:2411.18368v211 citationsh-index: 22NAACL
Originality Incremental advance
AI Analysis

This work addresses challenges in spontaneous multilingual speech recognition, offering incremental improvements for ASR systems in languages like Hindi and Marathi.

The paper tackled the problem of improving conversational automatic speech recognition (ASR) in multiple languages by augmenting a multilingual multimodal ASR system with paraphrase-based supervision, resulting in relative reductions in word error rates of up to 5%.

Spontaneous or conversational multilingual speech presents many challenges for state-of-the-art automatic speech recognition (ASR) systems. In this work, we present a new technique AMPS that augments a multilingual multimodal ASR system with paraphrase-based supervision for improved conversational ASR in multiple languages, including Hindi, Marathi, Malayalam, Kannada, and Nyanja. We use paraphrases of the reference transcriptions as additional supervision while training the multimodal ASR model and selectively invoke this paraphrase objective for utterances with poor ASR performance. Using AMPS with a state-of-the-art multimodal model SeamlessM4T, we obtain significant relative reductions in word error rates (WERs) of up to 5%. We present detailed analyses of our system using both objective and human evaluation metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes