CLMay 24, 2023

Bi-Drop: Enhancing Fine-tuning Generalization via Synchronous sub-net Estimation and Optimization

Shoujie Tong, Heming Xia, Damai Dai, Runxin Xu, Tianyu Liu, Binghuai Lin, Yunbo Cao, Zhifang Sui

arXiv:2305.14760v221.0131 citationsh-index: 34

Originality Incremental advance

AI Analysis

This addresses the issue of diminished performance in fine-tuning for natural language understanding, particularly in low-resource settings, though it appears incremental as it builds on dropout-based methods.

The paper tackles the problem of overfitting in fine-tuning pretrained language models on limited data by introducing Bi-Drop, a strategy that dynamically updates parameters using dropout-generated sub-nets, resulting in consistent performance improvements on the GLUE benchmark and enhanced generalization in various scenarios.

Pretrained language models have achieved remarkable success in natural language understanding. However, fine-tuning pretrained models on limited training data tends to overfit and thus diminish performance. This paper presents Bi-Drop, a fine-tuning strategy that selectively updates model parameters using gradients from various sub-nets dynamically generated by dropout. The sub-net estimation of Bi-Drop is performed in an in-batch manner, so it overcomes the problem of hysteresis in sub-net updating, which is possessed by previous methods that perform asynchronous sub-net estimation. Also, Bi-Drop needs only one mini-batch to estimate the sub-net so it achieves higher utility of training data. Experiments on the GLUE benchmark demonstrate that Bi-Drop consistently outperforms previous fine-tuning methods. Furthermore, empirical results also show that Bi-Drop exhibits excellent generalization ability and robustness for domain transfer, data imbalance, and low-resource scenarios.

View on arXiv PDF

Similar