IRJul 5, 2013

Graph-based Approach to Automatic Taxonomy Generation (GraBTax)

arXiv:1307.1718v215 citations
Originality Incremental advance
AI Analysis

This addresses the need for efficient taxonomy generation in domains like computer science, but it appears incremental as it builds on existing graph-based methods.

The authors tackled the problem of automatically generating concept hierarchies from large text corpora by proposing GraBTax, a graph-based approach that uses statistical co-occurrences and lexical similarity, and they evaluated it on computer science articles from CiteSeerX, achieving quality assessments through human judges and Wikipedia comparisons.

We propose a novel graph-based approach for constructing concept hierarchy from a large text corpus. Our algorithm, GraBTax, incorporates both statistical co-occurrences and lexical similarity in optimizing the structure of the taxonomy. To automatically generate topic-dependent taxonomies from a large text corpus, GraBTax first extracts topical terms and their relationships from the corpus. The algorithm then constructs a weighted graph representing topics and their associations. A graph partitioning algorithm is then used to recursively partition the topic graph into a taxonomy. For evaluation, we apply GraBTax to articles, primarily computer science, in the CiteSeerX digital library and search engine. The quality of the resulting concept hierarchy is assessed by both human judges and comparison with Wikipedia categories.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes