CLFeb 18

Aladdin-FTI @ AMIYA Three Wishes for Arabic NLP: Fidelity, Diglossia, and Multidialectal Generation

arXiv:2602.16290v11 citationsh-index: 5
Originality Synthesis-oriented
AI Analysis

This addresses the problem of limited computational resources for Arabic dialects, which is incremental as it builds on existing LLM advances for a specific domain.

The paper tackles the under-representation of Arabic dialects in NLP by developing Aladdin-FTI, a system that generates and translates across five dialects, Modern Standard Arabic, and English, with the code and model made publicly available.

Arabic dialects have long been under-represented in Natural Language Processing (NLP) research due to their non-standardization and high variability, which pose challenges for computational modeling. Recent advances in the field, such as Large Language Models (LLMs), offer promising avenues to address this gap by enabling Arabic to be modeled as a pluricentric language rather than a monolithic system. This paper presents Aladdin-FTI, our submission to the AMIYA shared task. The proposed system is designed to both generate and translate dialectal Arabic (DA). Specifically, the model supports text generation in Moroccan, Egyptian, Palestinian, Syrian, and Saudi dialects, as well as bidirectional translation between these dialects, Modern Standard Arabic (MSA), and English. The code and trained model are publicly available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes