CLOct 18, 2017

Build Fast and Accurate Lemmatization for Arabic

arXiv:1710.06700v139.21088 citations

Originality Incremental advance

AI Analysis

This work addresses the need for improved lemmatization to enhance Arabic information retrieval, providing a practical tool for researchers and practitioners in natural language processing.

The paper tackles the challenge of building a fast and accurate lemmatizer for Arabic, which has complex morphology, and introduces a new dataset and algorithm that outperform state-of-the-art methods in both accuracy and speed.

In this paper we describe the complexity of building a lemmatizer for Arabic which has a rich and complex derivational morphology, and we discuss the need for a fast and accurate lammatization to enhance Arabic Information Retrieval (IR) results. We also introduce a new data set that can be used to test lemmatization accuracy, and an efficient lemmatization algorithm that outperforms state-of-the-art Arabic lemmatization in terms of accuracy and speed. We share the data set and the code for public.

View on arXiv PDF

Similar