IRCLLGMLSep 2, 2018

Weakly-Supervised Neural Text Classification

arXiv:1809.01478v2197 citations
Originality Incremental advance
AI Analysis

This addresses the problem of data scarcity in text classification for real-world applications, but it is incremental as it builds on existing weakly-supervised approaches.

The paper tackles the lack of training data in neural text classification by proposing a weakly-supervised method that uses seed information to generate pseudo-labeled documents and self-training on unlabeled data, achieving inspiring performance and significantly outperforming baselines on three real-world datasets.

Deep neural networks are gaining increasing popularity for the classic text classification task, due to their strong expressive power and less requirement for feature engineering. Despite such attractiveness, neural text classification models suffer from the lack of training data in many real-world applications. Although many semi-supervised and weakly-supervised text classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types. In this paper, we propose a weakly-supervised method that addresses the lack of training data in neural text classification. Our method consists of two modules: (1) a pseudo-document generator that leverages seed information to generate pseudo-labeled documents for model pre-training, and (2) a self-training module that bootstraps on real unlabeled data for model refinement. Our method has the flexibility to handle different types of weak supervision and can be easily integrated into existing deep neural models for text classification. We have performed extensive experiments on three real-world datasets from different domains. The results demonstrate that our proposed method achieves inspiring performance without requiring excessive training data and outperforms baseline methods significantly.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes