CLAIJul 23, 2025

Natural Language Processing for Tigrinya: Current State and Future Directions

arXiv:2507.17974v22 citationsh-index: 5
Originality Synthesis-oriented
AI Analysis

It addresses the underrepresentation of Tigrinya in NLP, providing a reference and roadmap for researchers, but is incremental as a survey.

This paper surveys NLP research for Tigrinya, analyzing over 40 studies from 2011 to 2025 to review resources, models, and applications, revealing a shift from rule-based to neural systems driven by resource creation.

Despite being spoken by millions of people, Tigrinya remains severely underrepresented in Natural Language Processing (NLP) research. This work presents a comprehensive survey of NLP research for Tigrinya, analyzing over 40 studies spanning more than a decade of work from 2011 to 2025. We systematically review the current state of computational resources, models, and applications across ten distinct downstream tasks, including morphological processing, machine translation, speech recognition, and question-answering. Our analysis reveals a clear trajectory from foundational, rule-based systems to modern neural architectures, with progress consistently unlocked by resource creation milestones. We identify key challenges rooted in Tigrinya's morphological complexity and resource scarcity, while highlighting promising research directions, including morphology-aware modeling, cross-lingual transfer, and community-centered resource development. This work serves as both a comprehensive reference for researchers and a roadmap for advancing Tigrinya NLP. A curated metadata of the surveyed studies and resources is made publicly available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes