CLAIFeb 27, 2025

NaijaNLP: A Survey of Nigerian Low-Resource Languages

arXiv:2502.19784v22 citationsh-index: 7
Originality Synthesis-oriented
AI Analysis

It addresses the lack of coherent understanding and resources for low-resource NLP in Nigerian languages, which is an incremental survey effort.

This study conducted the first comprehensive review of low-resource NLP research for three major Nigerian languages, finding that only 25.1% of studies contribute new linguistic resources, highlighting a reliance on repurposing existing data.

With over 500 languages in Nigeria, three languages -- Hausa, Yorùbá and Igbo -- spoken by over 175 million people, account for about 60% of the spoken languages. However, these languages are categorised as low-resource due to insufficient resources to support tasks in computational linguistics. Several research efforts and initiatives have been presented, however, a coherent understanding of the state of Natural Language Processing (NLP) - from grammatical formalisation to linguistic resources that support complex tasks such as language understanding and generation is lacking. This study presents the first comprehensive review of advancements in low-resource NLP (LR-NLP) research across the three major Nigerian languages (NaijaNLP). We quantitatively assess the available linguistic resources and identify key challenges. Although a growing body of literature addresses various NLP downstream tasks in Hausa, Igbo, and Yorùbá, only about 25.1% of the reviewed studies contribute new linguistic resources. This finding highlights a persistent reliance on repurposing existing data rather than generating novel, high-quality resources. Additionally, language-specific challenges, such as the accurate representation of diacritics, remain under-explored. To advance NaijaNLP and LR-NLP more broadly, we emphasise the need for intensified efforts in resource enrichment, comprehensive annotation, and the development of open collaborative initiatives.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes