CLOct 24, 2023

NADI 2023: The Fourth Nuanced Arabic Dialect Identification Shared Task

Muhammad Abdul-Mageed, AbdelRahim Elmadany, Chiyu Zhang, El Moatez Billah Nagoudi, Houda Bouamor, Nizar Habash

arXiv:2310.16117v122.5144 citationsh-index: 24

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of advancing Arabic NLP for researchers by providing standardized datasets and tasks, though it is incremental as part of an ongoing series.

The paper presents the fourth Nuanced Arabic Dialect Identification Shared Task (NADI 2023), which tackled dialect identification and dialect-to-MSA machine translation, with winning teams achieving 87.27 F1, 14.76 Bleu, and 21.10 Bleu scores on the subtasks.

We describe the findings of the fourth Nuanced Arabic Dialect Identification Shared Task (NADI 2023). The objective of NADI is to help advance state-of-the-art Arabic NLP by creating opportunities for teams of researchers to collaboratively compete under standardized conditions. It does so with a focus on Arabic dialects, offering novel datasets and defining subtasks that allow for meaningful comparisons between different approaches. NADI 2023 targeted both dialect identification (Subtask 1) and dialect-to-MSA machine translation (Subtask 2 and Subtask 3). A total of 58 unique teams registered for the shared task, of whom 18 teams have participated (with 76 valid submissions during test phase). Among these, 16 teams participated in Subtask 1, 5 participated in Subtask 2, and 3 participated in Subtask 3. The winning teams achieved 87.27 F1 on Subtask 1, 14.76 Bleu in Subtask 2, and 21.10 Bleu in Subtask 3, respectively. Results show that all three subtasks remain challenging, thereby motivating future work in this area. We describe the methods employed by the participating teams and briefly offer an outlook for NADI.

View on arXiv PDF

Similar