CL AINov 19, 2025

Building Robust and Scalable Multilingual ASR for Indian Languages

Arjun Gangwar, Kaousheik Jayakumar, S. Umesh

arXiv:2511.15418v1h-index: 2

Originality Incremental advance

AI Analysis

This work addresses the challenge of robust and scalable ASR for diverse Indian languages and dialects, which is incremental as it builds on existing multilingual ASR methods with specific adaptations.

The paper tackled the problem of building multilingual automatic speech recognition (ASR) systems for 8 Indian languages across 33 dialects, focusing on language and dialect identification, and achieved improved performance over baselines in terms of word/character error rates and the highest accuracy in language and dialect identification among participants.

This paper describes the systems developed by SPRING Lab, Indian Institute of Technology Madras, for the ASRU MADASR 2.0 challenge. The systems developed focuses on adapting ASR systems to improve in predicting the language and dialect of the utterance among 8 languages across 33 dialects. We participated in Track 1 and Track 2, which restricts the use of additional data and develop from-the-scratch multilingual systems. We presented a novel training approach using Multi-Decoder architecture with phonemic Common Label Set (CLS) as intermediate representation. It improved the performance over the baseline (in the CLS space). We also discuss various methods used to retain the gain obtained in the phonemic space while converting them back to the corresponding grapheme representations. Our systems beat the baseline in 3 languages (Track 2) in terms of WER/CER and achieved the highest language ID and dialect ID accuracy among all participating teams (Track 2).

View on arXiv PDF

Similar