Regularization Trade-offs with Fake Features
This work addresses theoretical understanding of overparameterization for researchers in machine learning, but it is incremental as it builds on existing frameworks for model misspecification.
The paper investigates how fake features, which are present in the model but not in the data, affect generalization in ridge regression, deriving a non-asymptotic bound that reveals trade-offs between implicit regularization from fake features and explicit ridge regularization, with numerical results showing the optimal ridge parameter depends on the number of fake features.
Recent successes of massively overparameterized models have inspired a new line of work investigating the underlying conditions that enable overparameterized models to generalize well. This paper considers a framework where the possibly overparametrized model includes fake features, i.e., features that are present in the model but not in the data. We present a non-asymptotic high-probability bound on the generalization error of the ridge regression problem under the model misspecification of having fake features. Our highprobability results provide insights into the interplay between the implicit regularization provided by the fake features and the explicit regularization provided by the ridge parameter. Numerical results illustrate the trade-off between the number of fake features and how the optimal ridge parameter may heavily depend on the number of fake features.