AS CL LG SDOct 31, 2020

Multilingual Bottleneck Features for Improving ASR Performance of Code-Switched Speech in Under-Resourced Languages

Trideba Padhi, Astik Biswas, Febe De Wet, Ewald van der Westhuizen, Thomas Niesler

arXiv:2011.03118v12.34 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of limited annotated data for code-switched speech recognition in under-resourced languages, offering an incremental improvement by leveraging multilingual resources.

The study tackled the challenge of automatic speech recognition for code-switched speech in under-resourced African languages by using multilingual bottleneck features, resulting in clear performance improvements over a baseline for languages like English-isiZulu and English-isiXhosa.

In this work, we explore the benefits of using multilingual bottleneck features (mBNF) in acoustic modelling for the automatic speech recognition of code-switched (CS) speech in African languages. The unavailability of annotated corpora in the languages of interest has always been a primary challenge when developing speech recognition systems for this severely under-resourced type of speech. Hence, it is worthwhile to investigate the potential of using speech corpora available for other better-resourced languages to improve speech recognition performance. To achieve this, we train a mBNF extractor using nine Southern Bantu languages that form part of the freely available multilingual NCHLT corpus. We append these mBNFs to the existing MFCCs, pitch features and i-vectors to train acoustic models for automatic speech recognition (ASR) in the target code-switched languages. Our results show that the inclusion of the mBNF features leads to clear performance improvements over a baseline trained without the mBNFs for code-switched English-isiZulu, English-isiXhosa, English-Sesotho and English-Setswana speech.

View on arXiv PDF

Similar