AfriVox-v2: A Domain-Verticalized Benchmark for In-the-Wild African Speech Recognition
This benchmark addresses the underrepresentation of African languages in speech recognition evaluation, providing a realistic testbed for developers building localized voice AI.
AfriVox-v2 is a benchmark for African speech recognition that includes in-the-wild unscripted audio and domain-specific evaluations across ten sectors. It reveals a significant generalization gap in modern speech models for noisy, specialized African contexts.
Recent large language models (LLMs) show strong speech recognition and translation capabilities for high-resource languages. However, African languages remain dramatically underrepresented in benchmarks, limiting their practical use in low-resource settings. While early benchmarks tested African languages and accents, they lacked exhaustive real-world noise and granular domain evaluations. We present AfriVox-v2, a comprehensive benchmark designed to test speech models under realistic African deployment conditions. AfriVox-v2 introduces "in the wild" unscripted audio for all supported languages. We also introduce strict domain verticalization, evaluating model accuracy across ten sectors including government, finance, health, and agriculture and conducting targeted tests on numbers and named entities. Finally, we benchmark a new generation of speech models, including Sahara-v2, Gemini 3 Flash, and the Omnilingual CTC models. Our results expose the true generalization gap of modern speech models in specialized, noisy African contexts and provide a reliable blueprint for developers building localized voice AI.