IR CLDec 17, 2013

Generation, Implementation and Appraisal of an N-gram based Stemming Algorithm

arXiv:1312.4824v29 citations

Originality Synthesis-oriented

AI Analysis

This work offers an incremental improvement for natural language processing tasks by providing a more robust stemming method.

The authors tackled the problem of language-independent stemming by developing an N-gram based algorithm that addresses issues like stems starting with intermediate characters, and found it performs comparably to Porter's Stemmer.

A language independent stemmer has always been looked for. Single N-gram tokenization technique works well, however, it often generates stems that start with intermediate characters, rather than initial ones. We present a novel technique that takes the concept of N gram stemming one step ahead and compare our method with an established algorithm in the field, Porter's Stemmer. Results indicate that our N gram stemmer is not inferior to Porter's linguistic stemmer.

View on arXiv PDF

Similar