CL LGJun 10, 2020

Revisiting Few-sample BERT Fine-tuning

Tianyi Zhang, Felix Wu, Arzoo Katiyar, Kilian Q. Weinberger, Yoav Artzi

arXiv:2006.05987v317.0500 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses a practical problem for NLP practitioners using BERT in low-data settings, but it is incremental as it builds on existing fine-tuning methods.

The paper tackled the instability of BERT fine-tuning in few-sample scenarios by identifying factors like biased gradient estimation and limited network applicability, and found that alternative practices resolve this instability, reducing the impact of recent improvement methods.

This paper is a study of fine-tuning of BERT contextual representations, with focus on commonly observed instabilities in few-sample scenarios. We identify several factors that cause this instability: the common use of a non-standard optimization method with biased gradient estimation; the limited applicability of significant parts of the BERT network for down-stream tasks; and the prevalent practice of using a pre-determined, and small number of training iterations. We empirically test the impact of these factors, and identify alternative practices that resolve the commonly observed instability of the process. In light of these observations, we re-visit recently proposed methods to improve few-sample fine-tuning with BERT and re-evaluate their effectiveness. Generally, we observe the impact of these methods diminishes significantly with our modified process.

View on arXiv PDF Code

Similar