CL AIOct 21, 2020

NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task

Muhammad Abdul-Mageed, Chiyu Zhang, Houda Bouamor, Nizar Habash

arXiv:2010.11334v331.41007 citations

Originality Synthesis-oriented

AI Analysis

This addresses the need for nuanced dialect identification in Arabic natural language processing, but it is incremental as it builds on existing dialect identification efforts by introducing finer granularity.

They tackled the problem of identifying Arabic dialects at country and province levels by organizing the first shared task for fine-grained dialect identification, resulting in participation from 61 teams and submissions from 18 and 9 teams for the two subtasks.

We present the results and findings of the First Nuanced Arabic Dialect Identification Shared Task (NADI). This Shared Task includes two subtasks: country-level dialect identification (Subtask 1) and province-level sub-dialect identification (Subtask 2). The data for the shared task covers a total of 100 provinces from 21 Arab countries and are collected from the Twitter domain. As such, NADI is the first shared task to target naturally-occurring fine-grained dialectal text at the sub-country level. A total of 61 teams from 25 countries registered to participate in the tasks, thus reflecting the interest of the community in this area. We received 47 submissions for Subtask 1 from 18 teams and 9 submissions for Subtask 2 from 9 teams.

View on arXiv PDF

Similar