Build Fast and Accurate Lemmatization for Arabic
This work addresses the need for improved lemmatization to enhance Arabic information retrieval, providing a practical tool for researchers and practitioners in natural language processing.
The paper tackles the challenge of building a fast and accurate lemmatizer for Arabic, which has complex morphology, and introduces a new dataset and algorithm that outperform state-of-the-art methods in both accuracy and speed.
In this paper we describe the complexity of building a lemmatizer for Arabic which has a rich and complex derivational morphology, and we discuss the need for a fast and accurate lammatization to enhance Arabic Information Retrieval (IR) results. We also introduce a new data set that can be used to test lemmatization accuracy, and an efficient lemmatization algorithm that outperforms state-of-the-art Arabic lemmatization in terms of accuracy and speed. We share the data set and the code for public.