Overview of Stemming Algorithms for Indian and Non-Indian Languages
This is an incremental overview that addresses the need for stemming in information retrieval systems to reduce index file sizes and improve text processing efficiency.
The paper reviews various stemming algorithms for both Indian and non-Indian languages, discussing their methods, accuracy, and errors as a pre-processing step in text mining and natural language processing.
Stemming is a pre-processing step in Text Mining applications as well as a very common requirement of Natural Language processing functions. Stemming is the process for reducing inflected words to their stem. The main purpose of stemming is to reduce different grammatical forms / word forms of a word like its noun, adjective, verb, adverb etc. to its root form. Stemming is widely uses in Information Retrieval system and reduces the size of index files. We can say that the goal of stemming is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form. In this paper we have discussed different stemming algorithm for non-Indian and Indian language, methods of stemming, accuracy and errors.