CLMar 1, 2021

Adapting MARBERT for Improved Arabic Dialect Identification: Submission to the NADI 2021 Shared Task

arXiv:2103.01065v1801 citations
Originality Synthesis-oriented
AI Analysis

This addresses dialect identification for Arabic language processing, but is incremental as it adapts an existing model (MARBERT) to a shared task.

The paper tackled the Nuanced Arabic Dialect Identification (NADI) shared task for identifying geographic origin of Arabic utterances, achieving state-of-the-art results with an ensemble model based on MARBERT that improved F1-score by 7.63% to 34.03% on a key development set.

In this paper, we tackle the Nuanced Arabic Dialect Identification (NADI) shared task (Abdul-Mageed et al., 2021) and demonstrate state-of-the-art results on all of its four subtasks. Tasks are to identify the geographic origin of short Dialectal (DA) and Modern Standard Arabic (MSA) utterances at the levels of both country and province. Our final model is an ensemble of variants built on top of MARBERT that achieves an F1-score of 34.03% for DA at the country-level development set -- an improvement of 7.63% from previous work.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes