LGOct 18, 2023

Learning under Label Proportions for Text Classification

arXiv:2310.11707v1131 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses privacy-preserving and weakly supervised text classification, but it is incremental as it builds on existing LLP methods.

The paper tackles text classification under Learning from Label Proportions (LLP), where only aggregate class proportions are available, by proposing a novel robust formulation with a self-supervised objective, achieving better results than baselines in 87% of experimental configurations.

We present one of the preliminary NLP works under the challenging setup of Learning from Label Proportions (LLP), where the data is provided in an aggregate form called bags and only the proportion of samples in each class as the ground truth. This setup is inline with the desired characteristics of training models under Privacy settings and Weakly supervision. By characterizing some irregularities of the most widely used baseline technique DLLP, we propose a novel formulation that is also robust. This is accompanied with a learnability result that provides a generalization bound under LLP. Combining this formulation with a self-supervised objective, our method achieves better results as compared to the baselines in almost 87% of the experimental configurations which include large scale models for both long and short range texts across multiple metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes