Speech Recognition for Automatically Assessing Afrikaans and isiXhosa Preschool Oral Narratives
This work addresses the need for language development assessment tools in under-resourced languages and preschool ages, representing an incremental validation of existing child-speech ASR strategies in a novel setting.
The researchers tackled the problem of automatically assessing language development in Afrikaans and isiXhosa preschool children by developing ASR systems for their oral narratives, finding that using in-domain adult data with voice conversion improved performance, with semi-supervised learning helping for both languages and parameter-efficient fine-tuning effective only for Afrikaans.
We develop automatic speech recognition (ASR) systems for stories told by Afrikaans and isiXhosa preschool children. Oral narratives provide a way to assess children's language development before they learn to read. We consider a range of prior child-speech ASR strategies to determine which is best suited to this unique setting. Using Whisper and only 5 minutes of transcribed in-domain child speech, we find that additional in-domain adult data (adult speech matching the story domain) provides the biggest improvement, especially when coupled with voice conversion. Semi-supervised learning also helps for both languages, while parameter-efficient fine-tuning helps on Afrikaans but not on isiXhosa (which is under-represented in the Whisper model). Few child-speech studies look at non-English data, and even fewer at the preschool ages of 4 and 5. Our work therefore represents a unique validation of a wide range of previous child-speech ASR strategies in an under-explored setting.