LG AIOct 20, 2022

Surgical Fine-Tuning Improves Adaptation to Distribution Shifts

Yoonho Lee, Annie S. Chen, Fahim Tajwar, Ananya Kumar, Huaxiu Yao, Percy Liang, Chelsea Finn

arXiv:2210.11466v339.0280 citationsh-index: 102Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficient transfer learning under distribution shifts for machine learning practitioners, offering an incremental improvement over existing fine-tuning approaches.

The paper tackles the problem of adapting pre-trained models to distribution shifts by proposing surgical fine-tuning, which selectively tunes a subset of layers based on the shift type, and shows it matches or outperforms standard fine-tuning methods across seven real-world tasks.

A common approach to transfer learning under distribution shift is to fine-tune the last few layers of a pre-trained model, preserving learned features while also adapting to the new task. This paper shows that in such settings, selectively fine-tuning a subset of layers (which we term surgical fine-tuning) matches or outperforms commonly used fine-tuning approaches. Moreover, the type of distribution shift influences which subset is more effective to tune: for example, for image corruptions, fine-tuning only the first few layers works best. We validate our findings systematically across seven real-world data tasks spanning three types of distribution shifts. Theoretically, we prove that for two-layer neural networks in an idealized setting, first-layer tuning can outperform fine-tuning all layers. Intuitively, fine-tuning more parameters on a small target dataset can cause information learned during pre-training to be forgotten, and the relevant information depends on the type of shift.

View on arXiv PDF Code

Similar