ML LG STJun 16, 2020

Risk bounds when learning infinitely many response functions by ordinary linear regression

Vincent Plassier, François Portier, Johan Segers

arXiv:2006.09223v33.83 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of building multiple surrogate models efficiently for applications requiring many predictions, though it is incremental as it extends existing risk bound theory to infinite function sets.

The paper tackles the problem of simultaneously learning many response functions from a single input sample using ordinary linear regression in a high-dimensional feature space, providing convergence guarantees on the worst-case excess prediction risk with uniform control over response functions, even as the feature dimension grows with sample size.

Consider the problem of learning a large number of response functions simultaneously based on the same input variables. The training data consist of a single independent random sample of the input variables drawn from a common distribution together with the associated responses. The input variables are mapped into a high-dimensional linear space, called the feature space, and the response functions are modelled as linear functionals of the mapped features, with coefficients calibrated via ordinary least squares. We provide convergence guarantees on the worst-case excess prediction risk by controlling the convergence rate of the excess risk uniformly in the response function. The dimension of the feature map is allowed to tend to infinity with the sample size. The collection of response functions, although potentially infinite, is supposed to have a finite Vapnik-Chervonenkis dimension. The bound derived can be applied when building multiple surrogate models in a reasonable computing time.

View on arXiv PDF

Similar