CLOct 2, 2013

Rule Based Stemmer in Urdu

arXiv:1310.0581v125 citations
Originality Synthesis-oriented
AI Analysis

This addresses the problem of limited language processing tools for Urdu, but it is incremental as it applies an existing rule-based method to this specific domain.

The paper tackles the challenge of stemming in Urdu, a language with complex morphology, by developing a rule-based stemmer for information retrieval, and results were evaluated with a human expert.

Urdu is a combination of several languages like Arabic, Hindi, English, Turkish, Sanskrit etc. It has a complex and rich morphology. This is the reason why not much work has been done in Urdu language processing. Stemming is used to convert a word into its respective root form. In stemming, we separate the suffix and prefix from the word. It is useful in search engines, natural language processing and word processing, spell checkers, word parsing, word frequency and count studies. This paper presents a rule based stemmer for Urdu. The stemmer that we have discussed here is used in information retrieval. We have also evaluated our results by verifying it with a human expert.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes