CL AI LGDec 17, 2024

Your Next State-of-the-Art Could Come from Another Domain: A Cross-Domain Analysis of Hierarchical Text Classification

arXiv:2412.12744v11.0h-index: 37Has Code

Originality Incremental advance

AI Analysis

This work addresses a gap in hierarchical text classification research across domains like healthcare and law, offering insights for method design, though it is incremental in nature.

The paper tackled the lack of cross-domain understanding in hierarchical text classification by providing a comprehensive overview and empirical analysis, achieving new state-of-the-art results by applying techniques across domains.

Text classification with hierarchical labels is a prevalent and challenging task in natural language processing. Examples include assigning ICD codes to patient records, tagging patents into IPC classes, assigning EUROVOC descriptors to European legal texts, and more. Despite its widespread applications, a comprehensive understanding of state-of-the-art methods across different domains has been lacking. In this paper, we provide the first comprehensive cross-domain overview with empirical analysis of state-of-the-art methods. We propose a unified framework that positions each method within a common structure to facilitate research. Our empirical analysis yields key insights and guidelines, confirming the necessity of learning across different research areas to design effective methods. Notably, under our unified evaluation pipeline, we achieved new state-of-the-art results by applying techniques beyond their original domains.

View on arXiv PDF Code

Similar