ST MLMar 9, 2017

Cross-validation

arXiv:1703.03167v128.1214 citations

Originality Synthesis-oriented

AI Analysis

It addresses the problem of selecting appropriate cross-validation methods for machine learning practitioners, but it is incremental as it synthesizes existing knowledge.

This survey defines classical cross-validation procedures and analyzes their properties for risk estimation and estimator selection, providing guidelines for choosing the best method based on bias, variance, and overpenalization.

This text is a survey on cross-validation. We define all classical cross-validation procedures, and we study their properties for two different goals: estimating the risk of a given estimator, and selecting the best estimator among a given family. For the risk estimation problem, we compute the bias (which can also be corrected) and the variance of cross-validation methods. For estimator selection, we first provide a first-order analysis (based on expectations). Then, we explain how to take into account second-order terms (from variance computations, and by taking into account the usefulness of overpenalization). This allows, in the end, to provide some guidelines for choosing the best cross-validation method for a given learning problem.

View on arXiv PDF

Similar