CLMay 27, 2025

Charting the Landscape of African NLP: Mapping Progress and Shaping the Road Ahead

arXiv:2505.21315v314 citationsh-index: 35EMNLP
Originality Synthesis-oriented
AI Analysis

This survey addresses the digital divide for African linguistic communities by mapping progress in NLP, though it is incremental as it synthesizes existing work.

The paper tackles the underrepresentation of African languages in NLP by analyzing 884 research papers from the past five years, identifying trends and outlining directions for more inclusive research.

With over 2,000 languages and potentially millions of speakers, Africa represents one of the richest linguistic regions in the world. Yet, this diversity is scarcely reflected in state-of-the-art natural language processing (NLP) systems and large language models (LLMs), which predominantly support a narrow set of high-resource languages. This exclusion not only limits the reach and utility of modern NLP technologies but also risks widening the digital divide across linguistic communities. Nevertheless, NLP research on African languages is active and growing. In recent years, there has been a surge of interest in this area, driven by several factors-including the creation of multilingual language resources, the rise of community-led initiatives, and increased support through funding programs. In this survey, we analyze 884 research papers on NLP for African languages published over the past five years, offering a comprehensive overview of recent progress across core tasks. We identify key trends shaping the field and conclude by outlining promising directions to foster more inclusive and sustainable NLP research for African languages.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes