LGMLMar 31, 2024

Minimum-Norm Interpolation Under Covariate Shift

arXiv:2404.00522v213 citationsh-index: 9ICML
AI Analysis

This work addresses a foundational problem in machine learning theory for researchers and practitioners, providing insights into transfer learning with overparameterized models, though it is incremental as it builds on existing in-distribution benign overfitting research.

The paper tackles the theoretical gap in understanding transfer learning for high-dimensional linear models by proving the first non-asymptotic excess risk bounds for benignly-overfit linear interpolators under covariate shift, identifying beneficial and malignant shifts based on overparameterization.

Transfer learning is a critical part of real-world machine learning deployments and has been extensively studied in experimental works with overparameterized neural networks. However, even in the simplest setting of linear regression a notable gap still exists in the theoretical understanding of transfer learning. In-distribution research on high-dimensional linear regression has led to the identification of a phenomenon known as \textit{benign overfitting}, in which linear interpolators overfit to noisy training labels and yet still generalize well. This behavior occurs under specific conditions on the source covariance matrix and input data dimension. Therefore, it is natural to wonder how such high-dimensional linear models behave under transfer learning. We prove the first non-asymptotic excess risk bounds for benignly-overfit linear interpolators in the transfer learning setting. From our analysis, we propose a taxonomy of \textit{beneficial} and \textit{malignant} covariate shifts based on the degree of overparameterization. We follow our analysis with empirical studies that show these beneficial and malignant covariate shifts for linear interpolators on real image data, and for fully-connected neural networks in settings where the input data dimension is larger than the training sample size.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes