CLNov 18, 2025

AfriSpeech-MultiBench: A Verticalized Multidomain Multicountry Benchmark Suite for African Accented English ASR

Gabrial Zencha Ashungafac, Mardhiyah Sanni, Busayo Awobade, Alex Gichamba, Tobi Olatunji

arXiv:2511.14255v12 citationsHas CodeIJCNLP-AACL

Originality Synthesis-oriented

AI Analysis

This provides a domain-specific benchmark for African accented English ASR, addressing a gap for underserved communities, though it is incremental as it builds on existing datasets and methods.

The authors tackled the lack of evaluation benchmarks for African English accents in speech recognition by creating AfriSpeech-MultiBench, which tests over 100 accents across 10+ countries and 7 domains, revealing that open-source ASR models excel in spontaneous speech but degrade on noisy dialogue, while multimodal LLMs are accent-robust but struggle with named entities.

Recent advances in speech-enabled AI, including Google's NotebookLM and OpenAI's speech-to-speech API, are driving widespread interest in voice interfaces globally. Despite this momentum, there exists no publicly available application-specific model evaluation that caters to Africa's linguistic diversity. We present AfriSpeech-MultiBench, the first domain-specific evaluation suite for over 100 African English accents across 10+ countries and seven application domains: Finance, Legal, Medical, General dialogue, Call Center, Named Entities and Hallucination Robustness. We benchmark a diverse range of open, closed, unimodal ASR and multimodal LLM-based speech recognition systems using both spontaneous and non-spontaneous speech conversation drawn from various open African accented English speech datasets. Our empirical analysis reveals systematic variation: open-source ASR models excels in spontaneous speech contexts but degrades on noisy, non-native dialogue; multimodal LLMs are more accent-robust yet struggle with domain-specific named entities; proprietary models deliver high accuracy on clean speech but vary significantly by country and domain. Models fine-tuned on African English achieve competitive accuracy with lower latency, a practical advantage for deployment, hallucinations still remain a big problem for most SOTA models. By releasing this comprehensive benchmark, we empower practitioners and researchers to select voice technologies suited to African use-cases, fostering inclusive voice applications for underserved communities.

View on arXiv PDF

Similar