B. P. Pande

2papers

2 Papers

IRDec 24, 2013
Suffix Stripping Problem as an Optimization Problem

B. P. Pande, Pawan Tamta, H. S. Dhami

Stemming or suffix stripping, an important part of the modern Information Retrieval systems, is to find the root word (stem) out of a given cluster of words. Existing algorithms targeting this problem have been developed in a haphazard manner. In this work, we model this problem as an optimization problem. An Integer Program is being developed to overcome the shortcomings of the existing approaches. The sample results of the proposed method are also being compared with an established technique in the field for English language. An AMPL code for the same IP has also been given.

IRDec 17, 2013
Generation, Implementation and Appraisal of an N-gram based Stemming Algorithm

B. P. Pande, Pawan Tamta, H. S. Dhami

A language independent stemmer has always been looked for. Single N-gram tokenization technique works well, however, it often generates stems that start with intermediate characters, rather than initial ones. We present a novel technique that takes the concept of N gram stemming one step ahead and compare our method with an established algorithm in the field, Porter's Stemmer. Results indicate that our N gram stemmer is not inferior to Porter's linguistic stemmer.