CLMay 13, 2022

Weakly Supervised Text Classification using Supervision Signals from a Language Model

Tencent
arXiv:2205.06604v1631 citationsh-index: 52
Originality Incremental advance
AI Analysis

This addresses the problem of text classification with scarce human annotations, though it is incremental as it builds on existing language model techniques.

The paper tackles weakly supervised text classification by using a masked language model to generate supervision signals from cloze-style prompts, achieving performance gains of 2-4% over baselines on three datasets.

Solving text classification in a weakly supervised manner is important for real-world applications where human annotations are scarce. In this paper, we propose to query a masked language model with cloze style prompts to obtain supervision signals. We design a prompt which combines the document itself and "this article is talking about [MASK]." A masked language model can generate words for the [MASK] token. The generated words which summarize the content of a document can be utilized as supervision signals. We propose a latent variable model to learn a word distribution learner which associates generated words to pre-defined categories and a document classifier simultaneously without using any annotated data. Evaluation on three datasets, AGNews, 20Newsgroups, and UCINews, shows that our method can outperform baselines by 2%, 4%, and 3%.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes