CL SD ASAug 22, 2021

Multilingual Speech Recognition for Low-Resource Indian Languages using Multi-Task conformer

arXiv:2109.03969v29 citations

Originality Incremental advance

AI Analysis

This work addresses speech recognition for low-resource Indian languages, which is an incremental advancement in domain-specific ASR.

The authors tackled low-resource multilingual speech recognition for Indian languages by proposing a multi-task conformer model with dual decoders, achieving significant improvements over previous approaches and outperforming transformer-based and single-decoder methods.

Transformers have recently become very popular for sequence-to-sequence applications such as machine translation and speech recognition. In this work, we propose a multi-task learning-based transformer model for low-resource multilingual speech recognition for Indian languages. Our proposed model consists of a conformer [1] encoder and two parallel transformer decoders. We use a phoneme decoder (PHN-DEC) for the phoneme recognition task and a grapheme decoder (GRP-DEC) to predict grapheme sequence. We consider the phoneme recognition task as an auxiliary task for our multi-task learning framework. We jointly optimize the network for both phoneme and grapheme recognition tasks using Joint CTC-Attention [2] training. We use a conditional decoding scheme to inject the language information into the model before predicting the grapheme sequence. Our experiments show that our proposed approach can obtain significant improvement over previous approaches [4]. We also show that our conformer-based dual-decoder approach outperforms both the transformer-based dual-decoder approach and single decoder approach. Finally, We compare monolingual ASR models with our proposed multilingual ASR approach.

View on arXiv PDF

Similar