A Tale of Two Scripts: Transliteration and Post-Correction for Judeo-Arabic
This addresses a domain-specific problem for researchers and NLP practitioners working with historical Judeo-Arabic texts, but it is incremental as it builds on existing transliteration methods.
The paper tackles the challenge of transliterating Judeo-Arabic from Hebrew to Arabic script by introducing a two-step approach with character-level mapping and post-correction, achieving the first benchmark evaluation of LLMs on this task and enabling Arabic NLP tools for tasks like morphosyntactic tagging and machine translation.
Judeo-Arabic refers to Arabic variants historically spoken by Jewish communities across the Arab world, primarily during the Middle Ages. Unlike standard Arabic, it is written in Hebrew script by Jewish writers and for Jewish audiences. Transliterating Judeo-Arabic into Arabic script is challenging due to ambiguous letter mappings, inconsistent orthographic conventions, and frequent code-switching into Hebrew and Aramaic. In this paper, we introduce a two-step approach to automatically transliterate Judeo-Arabic into Arabic script: simple character-level mapping followed by post-correction to address grammatical and orthographic errors. We also present the first benchmark evaluation of LLMs on this task. Finally, we show that transliteration enables Arabic NLP tools to perform morphosyntactic tagging and machine translation, which would have not been feasible on the original texts.