CLAIApr 4, 2023

MEGClass: Extremely Weakly Supervised Text Classification via Mutually-Enhancing Text Granularities

arXiv:2304.01969v2132 citationsh-index: 22
Originality Incremental advance
AI Analysis

This work addresses the problem of costly annotations for text classification, particularly in specialized domains, by proposing an incremental improvement over existing extremely weakly supervised approaches.

The paper tackles text classification with extremely weak supervision using only class names, addressing issues of inter-granularity disagreements by jointly considering documents, sentences, and words. It introduces MEGClass, which outperforms other methods on seven benchmark datasets.

Text classification is essential for organizing unstructured text. Traditional methods rely on human annotations or, more recently, a set of class seed words for supervision, which can be costly, particularly for specialized or emerging domains. To address this, using class surface names alone as extremely weak supervision has been proposed. However, existing approaches treat different levels of text granularity (documents, sentences, or words) independently, disregarding inter-granularity class disagreements and the context identifiable exclusively through joint extraction. In order to tackle these issues, we introduce MEGClass, an extremely weakly-supervised text classification method that leverages Mutually-Enhancing Text Granularities. MEGClass utilizes coarse- and fine-grained context signals obtained by jointly considering a document's most class-indicative words and sentences. This approach enables the learning of a contextualized document representation that captures the most discriminative class indicators. By preserving the heterogeneity of potential classes, MEGClass can select the most informative class-indicative documents as iterative feedback to enhance the initial word-based class representations and ultimately fine-tune a pre-trained text classifier. Extensive experiments on seven benchmark datasets demonstrate that MEGClass outperforms other weakly and extremely weakly supervised methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes