LGOCAug 7, 2025

Adaptive Batch Size and Learning Rate Scheduler for Stochastic Gradient Descent Based on Minimization of Stochastic First-order Oracle Complexity

arXiv:2508.05302v1h-index: 4
Originality Incremental advance
AI Analysis

This work addresses a key optimization challenge for deep learning practitioners, though it is incremental as it builds on prior theoretical insights.

The paper tackled the problem of optimizing batch size and learning rate in mini-batch SGD to accelerate convergence by minimizing stochastic first-order oracle complexity, resulting in improved convergence speed compared to existing schedulers in experiments.

The convergence behavior of mini-batch stochastic gradient descent (SGD) is highly sensitive to the batch size and learning rate settings. Recent theoretical studies have identified the existence of a critical batch size that minimizes stochastic first-order oracle (SFO) complexity, defined as the expected number of gradient evaluations required to reach a stationary point of the empirical loss function in a deep neural network. An adaptive scheduling strategy is introduced to accelerate SGD that leverages theoretical findings on the critical batch size. The batch size and learning rate are adjusted on the basis of the observed decay in the full gradient norm during training. Experiments using an adaptive joint scheduler based on this strategy demonstrated improved convergence speed compared with that of existing schedulers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes