Concentration inequalities for leave-one-out cross validation
This work addresses the theoretical foundation for cross-validation methods in machine learning, offering a more general approach beyond Lipschitz assumptions, which is incremental but broadens applicability.
The paper tackles the problem of establishing the reliability of leave-one-out cross-validation by proving that estimator stability is sufficient, and it provides concentration bounds in a general framework, including examples like linear regression and kernel density estimation.
In this article we prove that estimator stability is enough to show that leave-one-out cross validation is a sound procedure, by providing concentration bounds in a general framework. In particular, we provide concentration bounds beyond Lipschitz continuity assumptions on the loss or on the estimator. We obtain our results by relying on random variables with distribution satisfying the logarithmic Sobolev inequality, providing us a relatively rich class of distributions. We illustrate our method by considering several interesting examples, including linear regression, kernel density estimation, and stabilized/truncated estimators such as stabilized kernel regression.