Optimizing for Generalization in Machine Learning with Cross-Validation Gradients
This work addresses the challenge of selecting optimal hyperparameters for practitioners in machine learning, offering a more efficient method for model tuning, though it appears incremental as it builds on existing cross-validation techniques.
The paper tackles the problem of hyperparameter optimization in machine learning by showing that cross-validation risk is differentiable for common algorithms, and proposes a cross-validation gradient method (CVGM) to efficiently optimize it in high-dimensional spaces, aiming to improve generalization performance.
Cross-validation is the workhorse of modern applied statistics and machine learning, as it provides a principled framework for selecting the model that maximizes generalization performance. In this paper, we show that the cross-validation risk is differentiable with respect to the hyperparameters and training data for many common machine learning algorithms, including logistic regression, elastic-net regression, and support vector machines. Leveraging this property of differentiability, we propose a cross-validation gradient method (CVGM) for hyperparameter optimization. Our method enables efficient optimization in high-dimensional hyperparameter spaces of the cross-validation risk, the best surrogate of the true generalization ability of our learning algorithm.