CLFeb 28, 2025

Arabizi vs LLMs: Can the Genie Understand the Language of Aladdin?

arXiv:2502.20973v22 citationsh-index: 3
Originality Synthesis-oriented
AI Analysis

It addresses the challenge of translating informal Arabizi for gisting purposes, which is incremental as it applies existing LLMs to a new linguistic domain.

This study evaluated the ability of large language models (LLMs) to translate Arabizi, an informal hybrid of Arabic using Latin characters, into Modern Standard Arabic and English, assessing performance across multiple Arabic dialects with human and automatic metrics.

In this era of rapid technological advancements, communication continues to evolve as new linguistic phenomena emerge. Among these is Arabizi, a hybrid form of Arabic that incorporates Latin characters and numbers to represent the spoken dialects of Arab communities. Arabizi is widely used on social media and allows people to communicate in an informal and dynamic way, but it poses significant challenges for machine translation due to its lack of formal structure and deeply embedded cultural nuances. This case study arises from a growing need to translate Arabizi for gisting purposes. It evaluates the capacity of different LLMs to decode and translate Arabizi, focusing on multiple Arabic dialects that have rarely been studied up until now. Using a combination of human evaluators and automatic metrics, this research project investigates the model's performance in translating Arabizi into both Modern Standard Arabic and English. Key questions explored include which dialects are translated most effectively and whether translations into English surpass those into Arabic.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes