CLJun 4, 2021

Bi-Granularity Contrastive Learning for Post-Training in Few-Shot Scene

arXiv:2106.02327v1713 citations
Originality Incremental advance
AI Analysis

This addresses the problem of adapting pre-trained models to downstream tasks with limited labeled data, but it is incremental as it builds on existing contrastive learning and masking techniques.

The paper tackles the instability and low performance of fine-tuning pre-trained language models on scarce labeled data by proposing a post-training method that integrates token-level and sequence-level contrastive learning using complementary random masking. Empirical results show it surpasses recent post-training methods in few-shot settings without data augmentation.

The major paradigm of applying a pre-trained language model to downstream tasks is to fine-tune it on labeled task data, which often suffers instability and low performance when the labeled examples are scarce.~One way to alleviate this problem is to apply post-training on unlabeled task data before fine-tuning, adapting the pre-trained model to target domains by contrastive learning that considers either token-level or sequence-level similarity. Inspired by the success of sequence masking, we argue that both token-level and sequence-level similarities can be captured with a pair of masked sequences.~Therefore, we propose complementary random masking (CRM) to generate a pair of masked sequences from an input sequence for sequence-level contrastive learning and then develop contrastive masked language modeling (CMLM) for post-training to integrate both token-level and sequence-level contrastive learnings.~Empirical results show that CMLM surpasses several recent post-training methods in few-shot settings without the need for data augmentation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes