Rule Based Stemmer in Urdu
This addresses the problem of limited language processing tools for Urdu, but it is incremental as it applies an existing rule-based method to this specific domain.
The paper tackles the challenge of stemming in Urdu, a language with complex morphology, by developing a rule-based stemmer for information retrieval, and results were evaluated with a human expert.
Urdu is a combination of several languages like Arabic, Hindi, English, Turkish, Sanskrit etc. It has a complex and rich morphology. This is the reason why not much work has been done in Urdu language processing. Stemming is used to convert a word into its respective root form. In stemming, we separate the suffix and prefix from the word. It is useful in search engines, natural language processing and word processing, spell checkers, word parsing, word frequency and count studies. This paper presents a rule based stemmer for Urdu. The stemmer that we have discussed here is used in information retrieval. We have also evaluated our results by verifying it with a human expert.