CLMay 13, 2022

Weakly Supervised Text Classification using Supervision Signals from a Language Model

Ziqian Zeng, Weimin Ni, Tianqing Fang, Xiang Li, Xinran Zhao, Yangqiu Song

Tencent

arXiv:2205.06604v131.9631 citationsh-index: 52Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of text classification with scarce human annotations, though it is incremental as it builds on existing language model techniques.

The paper tackles weakly supervised text classification by using a masked language model to generate supervision signals from cloze-style prompts, achieving performance gains of 2-4% over baselines on three datasets.

Solving text classification in a weakly supervised manner is important for real-world applications where human annotations are scarce. In this paper, we propose to query a masked language model with cloze style prompts to obtain supervision signals. We design a prompt which combines the document itself and "this article is talking about [MASK]." A masked language model can generate words for the [MASK] token. The generated words which summarize the content of a document can be utilized as supervision signals. We propose a latent variable model to learn a word distribution learner which associates generated words to pre-defined categories and a document classifier simultaneously without using any annotated data. Evaluation on three datasets, AGNews, 20Newsgroups, and UCINews, shows that our method can outperform baselines by 2%, 4%, and 3%.

View on arXiv PDF Code

Similar