Building English ASR model with regional language support
This addresses the need for multilingual ASR in regions like India, where English and Hindi are commonly used, and is incremental as it builds on existing ASR methods with novel adaptations.
The paper tackles the problem of building an English ASR system that handles Hindi queries without performance loss, achieving a 69.3% relative reduction in word error rate on Hindi and a 5.7% reduction on English compared to a monolingual model.
In this paper, we present a novel approach to developing an English Automatic Speech Recognition (ASR) system that can effectively handle Hindi queries, without compromising its performance on English. We propose a novel acoustic model (AM), referred to as SplitHead with Attention (SHA) model, features shared hidden layers across languages and language-specific projection layers combined via a self-attention mechanism. This mechanism estimates the weight for each language based on input data and weighs the corresponding language-specific projection layers accordingly. Additionally, we propose a language modeling approach that interpolates n-gram models from both English and transliterated Hindi text corpora. Our results demonstrate the effectiveness of our approach, with a 69.3% and 5.7% relative reduction in word error rate on Hindi and English test sets respectively when compared to a monolingual English model.