ASCLSDJul 6, 2020

Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters

arXiv:2007.03001v2165 citations
AI Analysis

This work addresses the challenge of deploying ASR systems for diverse languages, particularly benefiting low-resource language communities, and is incremental as it builds on existing multilingual training methods but at a larger scale.

The paper tackles the problem of improving automatic speech recognition (ASR) for low-resource languages by training a single multilingual model across 51 languages, achieving average word error rate (WER) reductions of 20.9% to 28.8% compared to monolingual baselines.

We study training a single acoustic model for multiple languages with the aim of improving automatic speech recognition (ASR) performance on low-resource languages, and over-all simplifying deployment of ASR systems that support diverse languages. We perform an extensive benchmark on 51 languages, with varying amount of training data by language(from 100 hours to 1100 hours). We compare three variants of multilingual training from a single joint model without knowing the input language, to using this information, to multiple heads (one per language cluster). We show that multilingual training of ASR models on several languages can improve recognition performance, in particular, on low resource languages. We see 20.9%, 23% and 28.8% average WER relative reduction compared to monolingual baselines on joint model, joint model with language input and multi head model respectively. To our knowledge, this is the first work studying multilingual ASR at massive scale, with more than 50 languages and more than 16,000 hours of audio across them.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes