CLOct 11, 2020

Automated Prediction of Medieval Arabic Diacritics

Khalid Alnajjar, Mika Hämäläinen, Niko Partanen, Jack Rueter

arXiv:2010.05269v10.73 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the specific challenge of automated diacritization for Medieval Arabic, which is incremental as it builds on existing methods with a focus on context size optimization.

The study tackled the problem of diacritizing Medieval Arabic text by using a character-level neural machine translation approach with an LSTM-based bi-directional RNN architecture, resulting in improved performance over an online baseline tool.

This study uses a character level neural machine translation approach trained on a long short-term memory-based bi-directional recurrent neural network architecture for diacritization of Medieval Arabic. The results improve from the online tool used as a baseline. A diacritization model have been published openly through an easy to use Python package available on PyPi and Zenodo. We have found that context size should be considered when optimizing a feasible prediction model.

View on arXiv PDF

Similar