IRLGJun 2, 2012

A Route Confidence Evaluation Method for Reliable Hierarchical Text Categorization

arXiv:1206.0335v11 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of reliable text categorization for domains with hierarchical data, but it is incremental as it builds on existing Local Classifier per Node approaches.

The paper tackled the problem of embedding hierarchical information to improve Hierarchical Text Categorization (HTC) systems by proposing a confidence evaluation method for selected routes in the hierarchy, which improved overall categorization accuracy by rejecting a small percentage of low-reliability samples on the Reuters benchmark dataset.

Hierarchical Text Categorization (HTC) is becoming increasingly important with the rapidly growing amount of text data available in the World Wide Web. Among the different strategies proposed to cope with HTC, the Local Classifier per Node (LCN) approach attains good performance by mirroring the underlying class hierarchy while enforcing a top-down strategy in the testing step. However, the problem of embedding hierarchical information (parent-child relationship) to improve the performance of HTC systems still remains open. A confidence evaluation method for a selected route in the hierarchy is proposed to evaluate the reliability of the final candidate labels in an HTC system. In order to take into account the information embedded in the hierarchy, weight factors are used to take into account the importance of each level. An acceptance/rejection strategy in the top-down decision making process is proposed, which improves the overall categorization accuracy by rejecting a few percentage of samples, i.e., those with low reliability score. Experimental results on the Reuters benchmark dataset (RCV1- v2) confirm the effectiveness of the proposed method, compared to other state-of-the art HTC methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes