CLLGNov 3, 2020

Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning

arXiv:2011.01403v3598 citations
AI Analysis

This addresses the issue of fine-tuning inefficiency for natural language understanding tasks, offering a method to enhance model robustness and generalization with limited labeled data, though it is incremental as it builds on existing contrastive learning ideas.

The paper tackles the problem of sub-optimal generalization and instability in fine-tuning pre-trained language models by proposing a supervised contrastive learning objective, which combined with cross-entropy loss achieves significant improvements over a RoBERTa-Large baseline on multiple GLUE datasets in few-shot settings.

State-of-the-art natural language understanding classification models follow two-stages: pre-training a large language model on an auxiliary task, and then fine-tuning the model on a task-specific labeled dataset using cross-entropy loss. However, the cross-entropy loss has several shortcomings that can lead to sub-optimal generalization and instability. Driven by the intuition that good generalization requires capturing the similarity between examples in one class and contrasting them with examples in other classes, we propose a supervised contrastive learning (SCL) objective for the fine-tuning stage. Combined with cross-entropy, our proposed SCL loss obtains significant improvements over a strong RoBERTa-Large baseline on multiple datasets of the GLUE benchmark in few-shot learning settings, without requiring specialized architecture, data augmentations, memory banks, or additional unsupervised data. Our proposed fine-tuning objective leads to models that are more robust to different levels of noise in the fine-tuning training data, and can generalize better to related tasks with limited labeled data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes