CLNov 25, 2023

Multilingual self-supervised speech representations improve the speech recognition of low-resource African languages with codeswitching

Stanford
arXiv:2311.15077v1133 citationsh-index: 109
Originality Synthesis-oriented
AI Analysis

This work addresses speech recognition challenges for speakers of low-resource African languages who codeswitch, offering an incremental improvement using existing methods on new data.

The paper tackled the problem of speech recognition for low-resource African languages with codeswitching by finetuning self-supervised multilingual speech representations and augmenting them with n-gram language models, resulting in up to 20% absolute reduction in word error rates compared to baselines.

While many speakers of low-resource languages regularly code-switch between their languages and other regional languages or English, datasets of codeswitched speech are too small to train bespoke acoustic models from scratch or do language model rescoring. Here we propose finetuning self-supervised speech representations such as wav2vec 2.0 XLSR to recognize code-switched data. We find that finetuning self-supervised multilingual representations and augmenting them with n-gram language models trained from transcripts reduces absolute word error rates by up to 20% compared to baselines of hybrid models trained from scratch on code-switched data. Our findings suggest that in circumstances with limited training data finetuning self-supervised representations is a better performing and viable solution.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes