CL LGFeb 7, 2021

CSS-LM: A Contrastive Framework for Semi-supervised Fine-tuning of Pre-trained Language Models

Yusheng Su, Xu Han, Yankai Lin, Zhengyan Zhang, Zhiyuan Liu, Peng Li, Jie Zhou, Maosong Sun

arXiv:2102.03752v31.412 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the problem of insufficient semantic feature capture for pre-trained language models in low-resource NLP tasks, offering an incremental improvement for practitioners working with limited labeled data.

This paper introduces CSS-LM, a contrastive semi-supervised learning framework for fine-tuning pre-trained language models in low-resource settings. It retrieves positive and negative instances from unlabeled corpora based on semantic relatedness and applies contrastive learning to both labeled and unlabeled data, achieving better results than conventional and supervised contrastive fine-tuning strategies in few-shot scenarios.

Fine-tuning pre-trained language models (PLMs) has demonstrated its effectiveness on various downstream NLP tasks recently. However, in many low-resource scenarios, the conventional fine-tuning strategies cannot sufficiently capture the important semantic features for downstream tasks. To address this issue, we introduce a novel framework (named "CSS-LM") to improve the fine-tuning phase of PLMs via contrastive semi-supervised learning. Specifically, given a specific task, we retrieve positive and negative instances from large-scale unlabeled corpora according to their domain-level and class-level semantic relatedness to the task. We then perform contrastive semi-supervised learning on both the retrieved unlabeled and original labeled instances to help PLMs capture crucial task-related semantic features. The experimental results show that CSS-LM achieves better results than the conventional fine-tuning strategy on a series of downstream tasks with few-shot settings, and outperforms the latest supervised contrastive fine-tuning strategies. Our datasets and source code will be available to provide more details.

View on arXiv PDF Code

Similar