CL LGOct 2, 2020

Data-Efficient Pretraining via Contrastive Self-Supervision

arXiv:2010.01061v42.321 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of data and compute efficiency for NLP practitioners with limited resources, though it is incremental as it builds on existing contrastive and self-supervised methods.

The paper tackles the problem of resource-efficient learning in NLP by proposing a contrastive self-supervised text encoder pretrained on only 60MB of task-internal data, which outperforms RoBERTa (pretrained on 160GB) while using one-fifth the fine-tuning time.

For natural language processing `text-to-text' tasks, the prevailing approaches heavily rely on pretraining large self-supervised models on increasingly larger `task-external' data. Transfer learning from high-resource pretraining works well, but research has focused on settings with very large data and compute requirements, while the potential of efficient low-resource learning, without large `task-external' pretraining, remains under-explored. In this work, we evaluate against three core challenges for resource efficient learning. Namely, we analyze: (1) pretraining data ($X$) efficiency; (2) zero to few-shot label ($Y$) efficiency; and (3) long-tail generalization, since long-tail preservation has been linked to algorithmic fairness and because data in the tail is limited by definition. To address these challenges, we propose a data and compute efficient self-supervised, contrastive text encoder, pretrained on 60MB of `task-internal' text data, and compare it to RoBERTa, which was pretrained on 160GB of `task-external' text. We find our method outperforms RoBERTa, while pretraining and fine-tuning in a 1/5th of RoBERTa's fine-tuning time.

View on arXiv PDF

Similar