ST LG OC CO MLSep 15, 2024

RandALO: Out-of-sample risk estimation in no time flat

Parth Nobel, Daniel LeJeune, Emmanuel J. Candès

arXiv:2409.09781v22.33 citationsh-index: 103Has Code

Originality Incremental advance

AI Analysis

This addresses the need for efficient risk estimation in machine learning, particularly for hyperparameter tuning, though it appears incremental as an improvement over existing CV methods.

The paper tackles the problem of expensive out-of-sample risk estimation for models on large high-dimensional datasets by proposing RandALO, a randomized approximate leave-one-out estimator that is consistent and computationally cheaper than K-fold CV.

Estimating out-of-sample risk for models trained on large high-dimensional datasets is an expensive but essential part of the machine learning process, enabling practitioners to optimally tune hyperparameters. Cross-validation (CV) serves as the de facto standard for risk estimation but poorly trades off high bias ($K$-fold CV) for computational cost (leave-one-out CV). We propose a randomized approximate leave-one-out (RandALO) risk estimator that is not only a consistent estimator of risk in high dimensions but also less computationally expensive than $K$-fold CV. We support our claims with extensive simulations on synthetic and real data and provide a user-friendly Python package implementing RandALO available on PyPI as randalo and at https://github.com/cvxgrp/randalo.

View on arXiv PDF Code

Similar