CLOct 13, 2021

Morphosyntactic Tagging with Pre-trained Language Models for Arabic and its Dialects

arXiv:2110.06852v2640 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of accurate morphosyntactic tagging for Arabic and its dialects, which is crucial for natural language processing applications in these languages, and is incremental by applying fine-tuning strategies to existing models.

The paper tackled morphosyntactic tagging for Arabic and its dialects by fine-tuning pre-trained transformer models, achieving state-of-the-art results with absolute improvements of 2.6% in Modern Standard Arabic, 2.8% in Gulf, 1.6% in Egyptian, and 8.3% in Levantine over previous systems.

We present state-of-the-art results on morphosyntactic tagging across different varieties of Arabic using fine-tuned pre-trained transformer language models. Our models consistently outperform existing systems in Modern Standard Arabic and all the Arabic dialects we study, achieving 2.6% absolute improvement over the previous state-of-the-art in Modern Standard Arabic, 2.8% in Gulf, 1.6% in Egyptian, and 8.3% in Levantine. We explore different training setups for fine-tuning pre-trained transformer language models, including training data size, the use of external linguistic resources, and the use of annotated data from other dialects in a low-resource scenario. Our results show that strategic fine-tuning using datasets from other high-resource dialects is beneficial for a low-resource dialect. Additionally, we show that high-quality morphological analyzers as external linguistic resources are beneficial especially in low-resource settings.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes