CLAIOct 21, 2020

NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task

arXiv:2010.11334v31007 citations
Originality Synthesis-oriented
AI Analysis

This addresses the need for nuanced dialect identification in Arabic natural language processing, but it is incremental as it builds on existing dialect identification efforts by introducing finer granularity.

They tackled the problem of identifying Arabic dialects at country and province levels by organizing the first shared task for fine-grained dialect identification, resulting in participation from 61 teams and submissions from 18 and 9 teams for the two subtasks.

We present the results and findings of the First Nuanced Arabic Dialect Identification Shared Task (NADI). This Shared Task includes two subtasks: country-level dialect identification (Subtask 1) and province-level sub-dialect identification (Subtask 2). The data for the shared task covers a total of 100 provinces from 21 Arab countries and are collected from the Twitter domain. As such, NADI is the first shared task to target naturally-occurring fine-grained dialectal text at the sub-country level. A total of 61 teams from 25 countries registered to participate in the tasks, thus reflecting the interest of the community in this area. We received 47 submissions for Subtask 1 from 18 teams and 9 submissions for Subtask 2 from 9 teams.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes