MLLGApr 18, 2024

Understanding Optimal Feature Transfer via a Fine-Grained Bias-Variance Analysis

arXiv:2404.12481v22 citationsh-index: 1
AI Analysis

This work provides theoretical insights for researchers in transfer learning, though it is incremental as it builds on existing linear models and asymptotic analysis.

The paper tackles the problem of optimizing downstream performance in transfer learning by deriving exact asymptotics of downstream risk and its bias-variance decomposition, identifying that the optimal pretrained representation is naturally sparse and exhibits a phase transition from hard to soft feature selection.

In the transfer learning paradigm models learn useful representations (or features) during a data-rich pretraining stage, and then use the pretrained representation to improve model performance on data-scarce downstream tasks. In this work, we explore transfer learning with the goal of optimizing downstream performance. We introduce a simple linear model that takes as input an arbitrary pretrained feature transform. We derive exact asymptotics of the downstream risk and its \textit{fine-grained} bias-variance decomposition. We then identify the pretrained representation that optimizes the asymptotic downstream bias and variance averaged over an ensemble of downstream tasks. Our theoretical and empirical analysis uncovers the surprising phenomenon that the optimal featurization is naturally sparse, even in the absence of explicit sparsity-inducing priors or penalties. Additionally, we identify a phase transition where the optimal pretrained representation shifts from hard selection to soft selection of relevant features.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes