MLLGFeb 3, 2025

Spurious Correlations in High Dimensional Regression: The Roles of Regularization, Simplicity Bias and Over-Parameterization

arXiv:2502.01347v210 citationsh-index: 6ICML
Originality Incremental advance
AI Analysis

This addresses robustness and fairness issues in machine learning by analyzing spurious correlation learning, though it is incremental as it builds on existing statistical frameworks.

The paper characterizes how high-dimensional linear regression learns spurious correlations from non-predictive features, quantifying this effect based on data covariance and ridge regularization, and shows a trade-off where minimizing test loss increases spurious correlations, with results validated on synthetic and real datasets.

Learning models have been shown to rely on spurious correlations between non-predictive features and the associated labels in the training data, with negative implications on robustness, bias and fairness. In this work, we provide a statistical characterization of this phenomenon for high-dimensional regression, when the data contains a predictive core feature $x$ and a spurious feature $y$. Specifically, we quantify the amount of spurious correlations $C$ learned via linear regression, in terms of the data covariance and the strength $λ$ of the ridge regularization. As a consequence, we first capture the simplicity of $y$ through the spectrum of its covariance, and its correlation with $x$ through the Schur complement of the full data covariance. Next, we prove a trade-off between $C$ and the in-distribution test loss $L$, by showing that the value of $λ$ that minimizes $L$ lies in an interval where $C$ is increasing. Finally, we investigate the effects of over-parameterization via the random features model, by showing its equivalence to regularized linear regression. Our theoretical results are supported by numerical experiments on Gaussian, Color-MNIST, and CIFAR-10 datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes