CLAINov 3, 2022

Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively

arXiv:2211.01642v139 citationsh-index: 28Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of efficient fine-tuning for NLP practitioners, offering an incremental improvement over existing methods.

The paper tackles overfitting and representation degradation in fine-tuning large pre-trained language models by proposing a Dynamic Parameter Selection (DPS) algorithm that adaptively selects subnetworks for updates, resulting in improved performance and stability on the GLUE benchmark and better out-of-domain transfer and low-resource results.

Large-scale pre-trained language models have achieved impressive results on a wide range of downstream tasks recently. However, fine-tuning an extremely large-scale pre-trained language model on limited target datasets is often plagued by overfitting and representation degradation. In this paper, we propose a Dynamic Parameter Selection (DPS) algorithm for the large-scale pre-trained models during fine-tuning, which adaptively selects a more promising subnetwork to perform staging updates based on gradients of back-propagation. Experiments on the GLUE benchmark show that DPS outperforms previous fine-tuning methods in terms of overall performance and stability, and consistently achieves better results with variable pre-trained language models. In addition, DPS brings a large magnitude of improvement in out-of-domain transferring experiments and low-resource scenarios, which shows that it can maintain stable general contextual features and reduce the representation collapse. We release our code at https://github.com/ZhangHaojie077/DPS

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes