Ciira wa Maina

58.6CLApr 9

AfriVoices-KE: A Multilingual Speech Dataset for Kenyan Languages

Lilian Wanzare, Cynthia Amol, zekiel Maina et al.

AfriVoices-KE is a large-scale multilingual speech dataset comprising approximately 3,000 hours of audio across five Kenyan languages: Dholuo, Kikuyu, Kalenjin, Maasai, and Somali. The dataset includes 750 hours of scripted speech and 2,250 hours of spontaneous speech, collected from 4,777 native speakers across diverse regions and demographics. This work addresses the critical underrepresentation of African languages in speech technology by providing a high-quality, linguistically diverse resource. Data collection followed a dual methodology: scripted recordings drew from compiled text corpora, translations, and domain-specific generated sentences spanning eleven domains relevant to the Kenyan context, while unscripted speech was elicited through textual and image prompts to capture natural linguistic variation and dialectal nuances. A customized mobile application enabled contributors to record using smartphones. Quality assurance operated at multiple layers, encompassing automated signal-to-noise ratio validation prior to recording and human review for content accuracy. Though the project encountered challenges common to low-resource settings, including unreliable infrastructure, device compatibility issues, and community trust barriers, these were mitigated through local mobilizers, stakeholder partnerships, and adaptive training protocols. AfriVoices-KE provides a foundational resource for developing inclusive automatic speech recognition and text-to-speech systems, while advancing the digital preservation of Kenya's linguistic heritage.

SDAug 10, 2021

An empirical investigation into audio pipeline approaches for classifying bird species

David Behr, Ciira wa Maina, Vukosi Marivate

This paper is an investigation into aspects of an audio classification pipeline that will be appropriate for the monitoring of bird species on edges devices. These aspects include transfer learning, data augmentation and model optimization. The hope is that the resulting models will be good candidates to deploy on edge devices to monitor bird populations. Two classification approaches will be taken into consideration, one which explores the effectiveness of a traditional Deep Neural Network(DNN) and another that makes use of Convolutional layers.This study aims to contribute empirical evidence of the merits and demerits of each approach.

Ciira wa Maina

2 Papers