ASCLMar 10, 2025

Building English ASR model with regional language support

arXiv:2503.07522v1h-index: 6
Originality Incremental advance
AI Analysis

This addresses the need for multilingual ASR in regions like India, where English and Hindi are commonly used, and is incremental as it builds on existing ASR methods with novel adaptations.

The paper tackles the problem of building an English ASR system that handles Hindi queries without performance loss, achieving a 69.3% relative reduction in word error rate on Hindi and a 5.7% reduction on English compared to a monolingual model.

In this paper, we present a novel approach to developing an English Automatic Speech Recognition (ASR) system that can effectively handle Hindi queries, without compromising its performance on English. We propose a novel acoustic model (AM), referred to as SplitHead with Attention (SHA) model, features shared hidden layers across languages and language-specific projection layers combined via a self-attention mechanism. This mechanism estimates the weight for each language based on input data and weighs the corresponding language-specific projection layers accordingly. Additionally, we propose a language modeling approach that interpolates n-gram models from both English and transliterated Hindi text corpora. Our results demonstrate the effectiveness of our approach, with a 69.3% and 5.7% relative reduction in word error rate on Hindi and English test sets respectively when compared to a monolingual English model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes