CLLGOct 2, 2020

Data-Efficient Pretraining via Contrastive Self-Supervision

arXiv:2010.01061v421 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of data and compute efficiency for NLP practitioners with limited resources, though it is incremental as it builds on existing contrastive and self-supervised methods.

The paper tackles the problem of resource-efficient learning in NLP by proposing a contrastive self-supervised text encoder pretrained on only 60MB of task-internal data, which outperforms RoBERTa (pretrained on 160GB) while using one-fifth the fine-tuning time.

For natural language processing `text-to-text' tasks, the prevailing approaches heavily rely on pretraining large self-supervised models on increasingly larger `task-external' data. Transfer learning from high-resource pretraining works well, but research has focused on settings with very large data and compute requirements, while the potential of efficient low-resource learning, without large `task-external' pretraining, remains under-explored. In this work, we evaluate against three core challenges for resource efficient learning. Namely, we analyze: (1) pretraining data ($X$) efficiency; (2) zero to few-shot label ($Y$) efficiency; and (3) long-tail generalization, since long-tail preservation has been linked to algorithmic fairness and because data in the tail is limited by definition. To address these challenges, we propose a data and compute efficient self-supervised, contrastive text encoder, pretrained on 60MB of `task-internal' text data, and compare it to RoBERTa, which was pretrained on 160GB of `task-external' text. We find our method outperforms RoBERTa, while pretraining and fine-tuning in a 1/5th of RoBERTa's fine-tuning time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes