CLLGApr 11, 2021

Constructing Contrastive samples via Summarization for Text Classification with limited annotations

arXiv:2104.05094v3661 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of limited supervised data in text classification, offering a domain-specific solution that appears incremental by building on existing contrastive learning methods.

The paper tackles the problem of limited annotations in text classification by proposing a novel contrastive learning framework that uses text summarization for data augmentation and introduces Mixsum regularization. Experiments on four real-world datasets (Amazon-5, Yelp-5, AG News, and IMDb) demonstrate its effectiveness, though no specific numerical gains are provided in the abstract.

Contrastive Learning has emerged as a powerful representation learning method and facilitates various downstream tasks especially when supervised data is limited. How to construct efficient contrastive samples through data augmentation is key to its success. Unlike vision tasks, the data augmentation method for contrastive learning has not been investigated sufficiently in language tasks. In this paper, we propose a novel approach to construct contrastive samples for language tasks using text summarization. We use these samples for supervised contrastive learning to gain better text representations which greatly benefit text classification tasks with limited annotations. To further improve the method, we mix up samples from different classes and add an extra regularization, named Mixsum, in addition to the cross-entropy-loss. Experiments on real-world text classification datasets (Amazon-5, Yelp-5, AG News, and IMDb) demonstrate the effectiveness of the proposed contrastive learning framework with summarization-based data augmentation and Mixsum regularization.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes