CLSDASAug 22, 2021

Multilingual Speech Recognition for Low-Resource Indian Languages using Multi-Task conformer

arXiv:2109.03969v29 citations
Originality Incremental advance
AI Analysis

This work addresses speech recognition for low-resource Indian languages, which is an incremental advancement in domain-specific ASR.

The authors tackled low-resource multilingual speech recognition for Indian languages by proposing a multi-task conformer model with dual decoders, achieving significant improvements over previous approaches and outperforming transformer-based and single-decoder methods.

Transformers have recently become very popular for sequence-to-sequence applications such as machine translation and speech recognition. In this work, we propose a multi-task learning-based transformer model for low-resource multilingual speech recognition for Indian languages. Our proposed model consists of a conformer [1] encoder and two parallel transformer decoders. We use a phoneme decoder (PHN-DEC) for the phoneme recognition task and a grapheme decoder (GRP-DEC) to predict grapheme sequence. We consider the phoneme recognition task as an auxiliary task for our multi-task learning framework. We jointly optimize the network for both phoneme and grapheme recognition tasks using Joint CTC-Attention [2] training. We use a conditional decoding scheme to inject the language information into the model before predicting the grapheme sequence. Our experiments show that our proposed approach can obtain significant improvement over previous approaches [4]. We also show that our conformer-based dual-decoder approach outperforms both the transformer-based dual-decoder approach and single decoder approach. Finally, We compare monolingual ASR models with our proposed multilingual ASR approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes