CLLGDec 4, 2017

Topics and Label Propagation: Best of Both Worlds for Weakly Supervised Text Classification

arXiv:1712.02767v14 citations
Originality Incremental advance
AI Analysis

This method addresses text classification for domains with limited labeled data, though it is incremental as it combines existing techniques.

The authors tackled weakly supervised text classification by proposing a label propagation algorithm that integrates topic modeling to reduce supervision needs, achieving sufficiently high accuracy with only a few manually labeled topics on various datasets.

We propose a Label Propagation based algorithm for weakly supervised text classification. We construct a graph where each document is represented by a node and edge weights represent similarities among the documents. Additionally, we discover underlying topics using Latent Dirichlet Allocation (LDA) and enrich the document graph by including the topics in the form of additional nodes. The edge weights between a topic and a text document represent level of "affinity" between them. Our approach does not require document level labelling, instead it expects manual labels only for topic nodes. This significantly minimizes the level of supervision needed as only a few topics are observed to be enough for achieving sufficiently high accuracy. The Label Propagation Algorithm is employed on this enriched graph to propagate labels among the nodes. Our approach combines the advantages of Label Propagation (through document-document similarities) and Topic Modelling (for minimal but smart supervision). We demonstrate the effectiveness of our approach on various datasets and compare with state-of-the-art weakly supervised text classification approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes