CLFeb 26, 2015

Rational Kernels for Arabic Stemming and Text Classification

arXiv:1502.07504v12 citations
Originality Incremental advance
AI Analysis

This work addresses text processing challenges for Arabic language applications, representing an incremental advancement in domain-specific methods.

The paper tackled Arabic text classification and stemming by introducing a pattern-based stemming technique using transducers, which does not rely on dictionaries, and applied rational kernels for classification. Results on the Saudi Press Agency dataset showed promising improvements in accuracy, recall, and F1 scores compared to other approaches.

In this paper, we address the problems of Arabic Text Classification and stemming using Transducers and Rational Kernels. We introduce a new stemming technique based on the use of Arabic patterns (Pattern Based Stemmer). Patterns are modelled using transducers and stemming is done without depending on any dictionary. Using transducers for stemming, documents are transformed into finite state transducers. This document representation allows us to use and explore rational kernels as a framework for Arabic Text Classification. Stemming experiments are conducted on three word collections and classification experiments are done on the Saudi Press Agency dataset. Results show that our approach, when compared with other approaches, is promising specially in terms of Accuracy, Recall and F1.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes