CLMar 12, 2014

HPS: a hierarchical Persian stemming method

arXiv:1403.2837v15 citations
Originality Incremental advance
AI Analysis

This work addresses the need for efficient and accurate stemming in Persian language processing, representing an incremental improvement over existing methods.

The paper tackles the problem of Persian stemming by introducing a hierarchical method based on part-of-speech, using hash tables and deterministic finite automata to remove prefixes and suffixes, achieving an average accuracy of 95.37% on test sets from Hamshahri and security news.

In this paper, a novel hierarchical Persian stemming approach based on the Part-Of-Speech of the word in a sentence is presented. The implemented stemmer includes hash tables and several deterministic finite automata in its different levels of hierarchy for removing the prefixes and suffixes of the words. We had two intentions in using hash tables in our method. The first one is that the DFA don't support some special words, so hash table can partly solve the addressed problem. the second goal is to speed up the implemented stemmer with omitting the time that deterministic finite automata need. Because of the hierarchical organization, this method is fast and flexible enough. Our experiments on test sets from Hamshahri collection and security news (istna.ir) show that our method has the average accuracy of 95.37% which is even improved in using the method on a test set with common topics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes