OC LG ST MLDec 15, 2025

Stopping Rules for Stochastic Gradient Descent via Anytime-Valid Confidence Sequences

arXiv:2512.13123v57.11 citations

Originality Highly original

AI Analysis

This addresses a longstanding theoretical gap for practitioners in machine learning and optimization who need reliable online stopping decisions for SGD, offering a novel framework applicable across both convex and nonconvex problems.

The paper tackles the problem of determining when to stop stochastic gradient descent (SGD) during runtime without pre-specified horizons, by developing anytime-valid confidence sequences that enable statistically valid stopping rules based on observed trajectories. The result provides certificates for suboptimality in convex settings and stationarity in nonconvex settings, with characterized stopping-time complexity under standard stepsize schedules.

The problem of stopping stochastic gradient descent (SGD) in an online manner, based solely on the observed trajectory, is a challenging theoretical problem with significant consequences for applications. While SGD is routinely monitored as it runs, the classical theory of SGD provides guarantees only at pre-specified iteration horizons and offers no valid way to decide, based on the observed trajectory, when further computation is justified. We address this longstanding gap by developing anytime-valid confidence sequences for stochastic gradient methods, which remain valid under continuous monitoring and directly induce statistically valid, trajectory-dependent stopping rules: stop as soon as the current upper confidence bound on an appropriate performance measure falls below a user-specified tolerance. The confidence sequences are constructed using nonnegative supermartingales, are time-uniform, and depend only on observable quantities along the SGD trajectory, without requiring prior knowledge of the optimization horizon. In convex optimization, this yields anytime-valid certificates for weighted suboptimality of projected SGD under general stepsize schedules, without assuming smoothness or strong convexity. In nonconvex optimization, it yields time-uniform certificates for weighted first-order stationarity under smoothness assumptions. We further characterize the stopping-time complexity of the resulting stopping rules under standard stepsize schedules. To the best of our knowledge, this is the first framework that provides statistically valid, time-uniform stopping rules for SGD across both convex and nonconvex settings based solely on its observed trajectory.

View on arXiv PDF

Similar