STLGOCCOMLSep 15, 2024

RandALO: Out-of-sample risk estimation in no time flat

arXiv:2409.09781v23 citationsh-index: 103Has Code
AI Analysis

This addresses the need for efficient risk estimation in machine learning, particularly for hyperparameter tuning, though it appears incremental as an improvement over existing CV methods.

The paper tackles the problem of expensive out-of-sample risk estimation for models on large high-dimensional datasets by proposing RandALO, a randomized approximate leave-one-out estimator that is consistent and computationally cheaper than K-fold CV.

Estimating out-of-sample risk for models trained on large high-dimensional datasets is an expensive but essential part of the machine learning process, enabling practitioners to optimally tune hyperparameters. Cross-validation (CV) serves as the de facto standard for risk estimation but poorly trades off high bias ($K$-fold CV) for computational cost (leave-one-out CV). We propose a randomized approximate leave-one-out (RandALO) risk estimator that is not only a consistent estimator of risk in high dimensions but also less computationally expensive than $K$-fold CV. We support our claims with extensive simulations on synthetic and real data and provide a user-friendly Python package implementing RandALO available on PyPI as randalo and at https://github.com/cvxgrp/randalo.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes