CLLGJun 10, 2020

Revisiting Few-sample BERT Fine-tuning

arXiv:2006.05987v3500 citations
Originality Incremental advance
AI Analysis

This addresses a practical problem for NLP practitioners using BERT in low-data settings, but it is incremental as it builds on existing fine-tuning methods.

The paper tackled the instability of BERT fine-tuning in few-sample scenarios by identifying factors like biased gradient estimation and limited network applicability, and found that alternative practices resolve this instability, reducing the impact of recent improvement methods.

This paper is a study of fine-tuning of BERT contextual representations, with focus on commonly observed instabilities in few-sample scenarios. We identify several factors that cause this instability: the common use of a non-standard optimization method with biased gradient estimation; the limited applicability of significant parts of the BERT network for down-stream tasks; and the prevalent practice of using a pre-determined, and small number of training iterations. We empirically test the impact of these factors, and identify alternative practices that resolve the commonly observed instability of the process. In light of these observations, we re-visit recently proposed methods to improve few-sample fine-tuning with BERT and re-evaluate their effectiveness. Generally, we observe the impact of these methods diminishes significantly with our modified process.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes