Cost Analysis of Human-corrected Transcription for Predominately Oral Languages
It provides a baseline for cost estimation in developing NLP resources for low-resource languages, which is incremental but practical for researchers and practitioners in this domain.
This paper tackled the challenge of creating speech datasets for low-resource, predominantly oral languages by analyzing the human labor cost of correcting ASR-generated transcriptions for Bambara, finding that it takes an average of 30 hours in lab conditions and 36 hours in field conditions to transcribe one hour of speech.
Creating speech datasets for low-resource languages is a critical yet poorly understood challenge, particularly regarding the actual cost in human labor. This paper investigates the time and complexity required to produce high-quality annotated speech data for a subset of low-resource languages, low literacy Predominately Oral Languages, focusing on Bambara, a Manding language of Mali. Through a one-month field study involving ten transcribers with native proficiency, we analyze the correction of ASR-generated transcriptions of 53 hours of Bambara voice data. We report that it takes, on average, 30 hours of human labor to accurately transcribe one hour of speech data under laboratory conditions and 36 hours under field conditions. The study provides a baseline and practical insights for a large class of languages with comparable profiles undertaking the creation of NLP resources.