Fast Development of ASR in African Languages using Self Supervised Speech Representation Learning
This work addresses the problem of developing speech recognition for under-resourced African languages, though it is incremental as it applies existing self-supervised methods to new data.
The paper tackled low-resource automatic speech recognition for Wolof, Ga, and Somali by using self-supervised pre-training on raw speech, achieving functional ASR systems with only 1 hour of transcribed training data per language.
This paper describes the results of an informal collaboration launched during the African Master of Machine Intelligence (AMMI) in June 2020. After a series of lectures and labs on speech data collection using mobile applications and on self-supervised representation learning from speech, a small group of students and the lecturer continued working on automatic speech recognition (ASR) project for three languages: Wolof, Ga, and Somali. This paper describes how data was collected and ASR systems developed with a small amount (1h) of transcribed speech as training data. In these low resource conditions, pre-training a model on large amounts of raw speech was fundamental for the efficiency of ASR systems developed.