LGAIOct 20, 2022

Surgical Fine-Tuning Improves Adaptation to Distribution Shifts

arXiv:2210.11466v3280 citationsh-index: 102
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient transfer learning under distribution shifts for machine learning practitioners, offering an incremental improvement over existing fine-tuning approaches.

The paper tackles the problem of adapting pre-trained models to distribution shifts by proposing surgical fine-tuning, which selectively tunes a subset of layers based on the shift type, and shows it matches or outperforms standard fine-tuning methods across seven real-world tasks.

A common approach to transfer learning under distribution shift is to fine-tune the last few layers of a pre-trained model, preserving learned features while also adapting to the new task. This paper shows that in such settings, selectively fine-tuning a subset of layers (which we term surgical fine-tuning) matches or outperforms commonly used fine-tuning approaches. Moreover, the type of distribution shift influences which subset is more effective to tune: for example, for image corruptions, fine-tuning only the first few layers works best. We validate our findings systematically across seven real-world data tasks spanning three types of distribution shifts. Theoretically, we prove that for two-layer neural networks in an idealized setting, first-layer tuning can outperform fine-tuning all layers. Intuitively, fine-tuning more parameters on a small target dataset can cause information learned during pre-training to be forgotten, and the relevant information depends on the type of shift.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes