Discriminative Topic Mining via Category-Name Guided Text Embedding
This addresses the need for more user-aligned and effective topic mining in text analysis, though it is incremental as it builds on existing topic modeling with minimal guidance.
The authors tackled the problem of mining topics from text corpora that align with user-provided category names, proposing a method called CatE that learns a discriminative embedding space to discover representative terms, which improved performance on tasks like weakly-supervised classification.
Mining a set of meaningful and distinctive topics automatically from massive text corpora has broad applications. Existing topic models, however, typically work in a purely unsupervised way, which often generate topics that do not fit users' particular needs and yield suboptimal performance on downstream tasks. We propose a new task, discriminative topic mining, which leverages a set of user-provided category names to mine discriminative topics from text corpora. This new task not only helps a user understand clearly and distinctively the topics he/she is most interested in, but also benefits directly keyword-driven classification tasks. We develop CatE, a novel category-name guided text embedding method for discriminative topic mining, which effectively leverages minimal user guidance to learn a discriminative embedding space and discover category representative terms in an iterative manner. We conduct a comprehensive set of experiments to show that CatE mines high-quality set of topics guided by category names only, and benefits a variety of downstream applications including weakly-supervised classification and lexical entailment direction identification.