Multilingual Transformer Language Model for Speech Recognition in Low-resource Languages
This addresses the problem of data scarcity and high costs for deploying speech recognition systems in low-resource languages, offering a practical solution for language technology applications.
The paper tackles the challenge of training Transformer language models for speech recognition in low-resource languages by grouping locales together, resulting in improved performance and reduced costs compared to traditional multilingual models.
It is challenging to train and deploy Transformer LMs for hybrid speech recognition 2nd pass re-ranking in low-resource languages due to (1) data scarcity in low-resource languages, (2) expensive computing costs for training and refreshing 100+ monolingual models, and (3) hosting inefficiency considering sparse traffic. In this study, we present a new way to group multiple low-resource locales together and optimize the performance of Multilingual Transformer LMs in ASR. Our Locale-group Multilingual Transformer LMs outperform traditional multilingual LMs along with reducing maintenance costs and operating expenses. Further, for low-resource but high-traffic locales where deploying monolingual models is feasible, we show that fine-tuning our locale-group multilingual LMs produces better monolingual LM candidates than baseline monolingual LMs.