CLApr 27, 2020

Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting

arXiv:2004.12651v11047 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the problem of performance degradation in transfer learning for NLP practitioners, though it is incremental as it builds on existing fine-tuning paradigms.

The paper tackles catastrophic forgetting in fine-tuning pretrained language models by introducing a recall and learn mechanism that jointly learns pretraining and downstream tasks, achieving state-of-the-art performance on the GLUE benchmark and enabling BERT-base to outperform BERT-large.

Deep pretrained language models have achieved great success in the way of pretraining first and then fine-tuning. But such a sequential transfer learning paradigm often confronts the catastrophic forgetting problem and leads to sub-optimal performance. To fine-tune with less forgetting, we propose a recall and learn mechanism, which adopts the idea of multi-task learning and jointly learns pretraining tasks and downstream tasks. Specifically, we propose a Pretraining Simulation mechanism to recall the knowledge from pretraining tasks without data, and an Objective Shifting mechanism to focus the learning on downstream tasks gradually. Experiments show that our method achieves state-of-the-art performance on the GLUE benchmark. Our method also enables BERT-base to achieve better performance than directly fine-tuning of BERT-large. Further, we provide the open-source RecAdam optimizer, which integrates the proposed mechanisms into Adam optimizer, to facility the NLP community.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes