Near-Interpolators: Rapid Norm Growth and the Trade-Off between Interpolation and Generalization
This work addresses the fundamental trade-off between interpolation and generalization in machine learning, with implications for norm-based bounds and model selection, though it is incremental in extending linear results to shallow neural networks.
The paper tackles the generalization capability of nearly-interpolating linear regressors, showing that they exhibit rapid norm growth, specifically squared ℓ2-norm scaling as Ω(n^α) with α>1, and characterizes a trade-off where larger α leads to worse generalization.
We study the generalization capability of nearly-interpolating linear regressors: $\boldsymbolβ$'s whose training error $τ$ is positive but small, i.e., below the noise floor. Under a random matrix theoretic assumption on the data distribution and an eigendecay assumption on the data covariance matrix $\boldsymbolΣ$, we demonstrate that any near-interpolator exhibits rapid norm growth: for $τ$ fixed, $\boldsymbolβ$ has squared $\ell_2$-norm $\mathbb{E}[\|{\boldsymbolβ}\|_{2}^{2}] = Ω(n^α)$ where $n$ is the number of samples and $α>1$ is the exponent of the eigendecay, i.e., $λ_i(\boldsymbolΣ) \sim i^{-α}$. This implies that existing data-independent norm-based bounds are necessarily loose. On the other hand, in the same regime we precisely characterize the asymptotic trade-off between interpolation and generalization. Our characterization reveals that larger norm scaling exponents $α$ correspond to worse trade-offs between interpolation and generalization. We verify empirically that a similar phenomenon holds for nearly-interpolating shallow neural networks.