CLDec 24, 2025

Opportunities and Challenges of Natural Language Processing for Low-Resource Senegalese Languages in Social Science Research

arXiv:2601.09716v1h-index: 6
Originality Synthesis-oriented
AI Analysis

It addresses the underrepresentation of African languages in NLP, particularly for social science research in Senegal, by outlining a roadmap for sustainable, community-centered ecosystems, though it is incremental as it builds on existing initiatives.

This paper provides the first comprehensive overview of Natural Language Processing (NLP) progress and challenges for six low-resource Senegalese languages, synthesizing factors affecting their digital readiness and identifying gaps in data, tools, and benchmarks, while offering a centralized GitHub repository to facilitate collaboration and reproducibility.

Natural Language Processing (NLP) is rapidly transforming research methodologies across disciplines, yet African languages remain largely underrepresented in this technological shift. This paper provides the first comprehensive overview of NLP progress and challenges for the six national languages officially recognized by the Senegalese Constitution: Wolof, Pulaar, Sereer, Joola, Mandingue, and Soninke. We synthesize linguistic, sociotechnical, and infrastructural factors that shape their digital readiness and identify gaps in data, tools, and benchmarks. Building on existing initiatives and research works, we analyze ongoing efforts in text normalization, machine translation, and speech processing. We also provide a centralized GitHub repository that compiles publicly accessible resources for a range of NLP tasks across these languages, designed to facilitate collaboration and reproducibility. A special focus is devoted to the application of NLP to the social sciences, where multilingual transcription, translation, and retrieval pipelines can significantly enhance the efficiency and inclusiveness of field research. The paper concludes by outlining a roadmap toward sustainable, community-centered NLP ecosystems for Senegalese languages, emphasizing ethical data governance, open resources, and interdisciplinary collaboration.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes