CL LG SD ASApr 2, 2024

Africa-Centric Self-Supervised Pre-Training for Multilingual Speech Representation in a Sub-Saharan Context

arXiv:2404.02000v39 citationsh-index: 10

Originality Incremental advance

AI Analysis

This addresses the problem of limited speech representation for sub-Saharan African languages, offering a more efficient and effective model for multilingual tasks in this region.

The researchers tackled the lack of African speech data by training the first self-supervised multilingual speech model exclusively on nearly 60,000 hours of African speech, achieving competitive ASR results with 7x less data and 6x fewer parameters and outperforming baselines by over 22% in LID accuracy.

We present the first self-supervised multilingual speech model trained exclusively on African speech. The model learned from nearly 60 000 hours of unlabeled speech segments in 21 languages and dialects spoken in sub-Saharan Africa. On the SSA subset of the FLEURS-102 dataset, our approach based on a HuBERT$_{base}$ (0.09B) architecture shows competitive results, for ASR downstream task, compared to the w2v-bert-51 (0.6B) pre-trained model proposed in the FLEURS benchmark, while being more efficient by using 7x less data and 6x less parameters. Furthermore, in the context of a LID downstream task, our approach outperforms FLEURS baselines accuracy by over 22\%.

View on arXiv PDF

Similar