Hybrid approaches for automatic vowelization of Arabic texts
This work addresses the challenge of automatic vowelization in Arabic, which is crucial for enhancing natural language processing tasks, but it appears to be an incremental improvement over existing methods.
The authors tackled the problem of automatic vowelization of Arabic texts by combining a morphological analyzer with hidden Markov models to disambiguate possible vowelizations, achieving improved performance for natural language processing applications.
Hybrid approaches for automatic vowelization of Arabic texts are presented in this article. The process is made up of two modules. In the first one, a morphological analysis of the text words is performed using the open source morphological Analyzer AlKhalil Morpho Sys. Outputs for each word analyzed out of context, are its different possible vowelizations. The integration of this Analyzer in our vowelization system required the addition of a lexical database containing the most frequent words in Arabic language. Using a statistical approach based on two hidden Markov models (HMM), the second module aims to eliminate the ambiguities. Indeed, for the first HMM, the unvowelized Arabic words are the observed states and the vowelized words are the hidden states. The observed states of the second HMM are identical to those of the first, but the hidden states are the lists of possible diacritics of the word without its Arabic letters. Our system uses Viterbi algorithm to select the optimal path among the solutions proposed by Al Khalil Morpho Sys. Our approach opens an important way to improve the performance of automatic vowelization of Arabic texts for other uses in automatic natural language processing.