Stability Regularized Cross-Validation
This addresses the issue of unstable model performance for practitioners using interpretable models like sparse regression and CART, though it is incremental as it builds on existing cross-validation methods.
The authors tackled the problem of improving test-set performance in cross-validation by introducing a nested k-fold scheme that incorporates a model-stability measure, reducing the risk of poor generalization due to instability. They benchmarked on 13 UCI datasets, achieving an average 4% improvement in out-of-sample MSE for sparse ridge regression and CART, but no impact on XGBoost.
We revisit the problem of ensuring strong test-set performance via cross-validation. Motivated by the generalization theory literature, we propose a nested k-fold cross-validation scheme that selects hyperparameters by minimizing a weighted sum of the usual cross-validation metric and an empirical model-stability measure. The weight on the stability term is itself chosen via a nested cross-validation procedure. This reduces the risk of strong validation set performance and poor test set performance due to instability. We benchmark our procedure on a suite of 13 real-world UCI datasets, and find that, compared to k-fold cross-validation over the same hyperparameters, it improves the out-of-sample MSE for sparse ridge regression and CART by 4% on average, but has no impact on XGBoost. This suggests that for interpretable and unstable models, such as sparse regression and CART, our approach is a viable and computationally affordable method for improving test-set performance.