CLOct 2, 2013

Stemmers for Tamil Language: Performance Analysis

arXiv:1310.0754v119 citations

Originality Synthesis-oriented

AI Analysis

This work addresses stemming challenges for Tamil NLP applications, but it is incremental as it builds on existing rule-based methods.

The authors tackled the problem of stemming for the Tamil language, which has rich morphology, by proposing a rule-based light-stemmer approach and comparing it to a suffix removal stemmer, finding that it performs better and is more effective in Information Retrieval Systems.

Stemming is the process of extracting root word from the given inflection word and also plays significant role in numerous application of Natural Language Processing (NLP). Tamil Language raises several challenges to NLP, since it has rich morphological patterns than other languages. The rule based approach light-stemmer is proposed in this paper, to find stem word for given inflection Tamil word. The performance of proposed approach is compared to a rule based suffix removal stemmer based on correctly and incorrectly predicted. The experimental result clearly show that the proposed approach light stemmer for Tamil language perform better than suffix removal stemmer and also more effective in Information Retrieval System (IRS).

View on arXiv PDF

Similar