CLJun 28, 2023

Taqyim: Evaluating Arabic NLP Tasks Using ChatGPT Models

Zaid Alyafeai, Maged S. Alshaibani, Badr AlKhamissi, Hamzah Luqman, Ebrahim Alareqi, Ali Fadel

arXiv:2306.16322v14.325 citationsh-index: 20Has Code

Originality Synthesis-oriented

AI Analysis

This work assesses ChatGPT models for Arabic NLP, providing benchmarks and tools for researchers, but it is incremental as it applies existing methods to a new language context.

The study evaluated GPT-3.5 and GPT-4 on seven Arabic NLP tasks, finding that GPT-4 outperformed GPT-3.5 on five tasks, with detailed analysis on sentiment analysis using a dialectal dataset.

Large language models (LLMs) have demonstrated impressive performance on various downstream tasks without requiring fine-tuning, including ChatGPT, a chat-based model built on top of LLMs such as GPT-3.5 and GPT-4. Despite having a lower training proportion compared to English, these models also exhibit remarkable capabilities in other languages. In this study, we assess the performance of GPT-3.5 and GPT-4 models on seven distinct Arabic NLP tasks: sentiment analysis, translation, transliteration, paraphrasing, part of speech tagging, summarization, and diacritization. Our findings reveal that GPT-4 outperforms GPT-3.5 on five out of the seven tasks. Furthermore, we conduct an extensive analysis of the sentiment analysis task, providing insights into how LLMs achieve exceptional results on a challenging dialectal dataset. Additionally, we introduce a new Python interface https://github.com/ARBML/Taqyim that facilitates the evaluation of these tasks effortlessly.

View on arXiv PDF Code

Similar