ML LGSep 25, 2025

Breaking the curse of dimensionality for linear rules: optimal predictors over the ellipsoid

arXiv:2509.21174v14.51 citationsh-index: 3

Originality Incremental advance

AI Analysis

This work addresses a foundational issue in statistical learning theory for researchers, providing insights into structural assumptions to mitigate the curse of dimensionality, though it appears incremental as it builds on classical settings and known methods.

The paper tackles the problem of preventing the degradation of statistical learning bounds with increasing dimensionality by deriving non-asymptotic upper and lower bounds on generalization error for linear prediction rules under the assumption that the Bayes predictor lies in an ellipsoid, and establishes a lower bound for rotationally invariant rules with a fixed predictor, highlighting contributions from intrinsic dimensionality and high-dimensional noiseless error.

In this work, we address the following question: What minimal structural assumptions are needed to prevent the degradation of statistical learning bounds with increasing dimensionality? We investigate this question in the classical statistical setting of signal estimation from $n$ independent linear observations $Y_i = X_i^{\top}θ+ ε_i$. Our focus is on the generalization properties of a broad family of predictors that can be expressed as linear combinations of the training labels, $f(X) = \sum_{i=1}^{n} l_{i}(X) Y_i$. This class -- commonly referred to as linear prediction rules -- encompasses a wide range of popular parametric and non-parametric estimators, including ridge regression, gradient descent, and kernel methods. Our contributions are twofold. First, we derive non-asymptotic upper and lower bounds on the generalization error for this class under the assumption that the Bayes predictor $θ$ lies in an ellipsoid. Second, we establish a lower bound for the subclass of rotationally invariant linear prediction rules when the Bayes predictor is fixed. Our analysis highlights two fundamental contributions to the risk: (a) a variance-like term that captures the intrinsic dimensionality of the data; (b) the noiseless error, a term that arises specifically in the high-dimensional regime. These findings shed light on the role of structural assumptions in mitigating the curse of dimensionality.

View on arXiv PDF

Similar