CLMay 9, 2022

XSTEM: An exemplar-based stemming algorithm

arXiv:2205.04355v21 citationsh-index: 6
Originality Incremental advance
AI Analysis

This work addresses stemming challenges for natural language processing applications, presenting an incremental improvement over existing methods.

The paper tackles the problem of stemming by introducing XSTEM, a fast and configurable algorithm that combines lookup tables and rule-based methods to achieve high precision and recall while handling unknown words effectively.

Stemming is the process of reducing related words to a standard form by removing affixes from them. Existing algorithms vary with respect to their complexity, configurability, handling of unknown words, and ability to avoid under- and over-stemming. This paper presents a fast, simple, configurable, high-precision, high-recall stemming algorithm that combines the simplicity and performance of word-based lookup tables with the strong generalizability of rule-based methods to avert problems with out-of-vocabulary words.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes