CLApr 2, 2022

Constrained Sequence-to-Tree Generation for Hierarchical Text Classification

arXiv:2204.00811v233 citationsh-index: 8
Originality Highly original
AI Analysis

This work addresses the label inconsistency problem in hierarchical text classification, which is incremental as it builds on prior flat classification methods.

The paper tackles the hierarchical text classification problem by formulating it as a sequence generation task and introducing a sequence-to-tree framework with constrained decoding to address label inconsistency, achieving significant and consistent improvements on three benchmark datasets.

Hierarchical Text Classification (HTC) is a challenging task where a document can be assigned to multiple hierarchically structured categories within a taxonomy. The majority of prior studies consider HTC as a flat multi-label classification problem, which inevitably leads to "label inconsistency" problem. In this paper, we formulate HTC as a sequence generation task and introduce a sequence-to-tree framework (Seq2Tree) for modeling the hierarchical label structure. Moreover, we design a constrained decoding strategy with dynamic vocabulary to secure the label consistency of the results. Compared with previous works, the proposed approach achieves significant and consistent improvements on three benchmark datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes