CLFeb 4

Linguistically Informed Evaluation of Multilingual ASR for African Languages

arXiv:2602.04716v1h-index: 118
Originality Incremental advance
AI Analysis

This addresses the evaluation gap for ASR models in African languages, which is incremental by proposing refined metrics rather than a new model.

The paper tackled the problem that Word Error Rate (WER) mischaracterizes ASR performance for African languages by combining linguistic errors, and showed that Feature Error Rate (FER) and Tone Error Rate (TER) reveal linguistically-salient error patterns, with results like WER=0.788 and FER=0.151 for Yoruba.

Word Error Rate (WER) mischaracterizes ASR models' performance for African languages by combining phonological, tone, and other linguistic errors into a single lexical error. By contrast, Feature Error Rate (FER) has recently attracted attention as a viable metric that reveals linguistically meaningful errors in models' performance. In this paper, we evaluate three speech encoders on two African languages by complementing WER with CER, and FER, and add a tone-aware extension (TER). We show that by computing errors on phonological features, FER and TER reveal linguistically-salient error patterns even when word-level accuracy remains low. Our results reveal that models perform better on segmental features, while tones (especially mid and downstep) remain the most challenging features. Results on Yoruba show a striking differential in metrics, with WER=0.788, CER=0.305, and FER=0.151. Similarly for Uneme (an endangered language absent from pretraining data) a model with near-total WER and 0.461 CER achieves the relatively low FER of 0.267. This indicates model error is often attributable to individual phonetic feature errors, which is obscured by all-or-nothing metrics like WER.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes