Stochastic Adaptive Gradient Descent Without Descent
This work addresses the challenge of hyper-parameter tuning in stochastic gradient descent for convex optimization, offering a method that is theoretically grounded and empirically competitive, though it appears incremental as an adaptation of an existing deterministic method to the stochastic setting.
The paper tackles the problem of adaptive step-size selection in stochastic convex optimization by introducing a new method that requires no hyper-parameter tuning and uses only first-order stochastic oracles. It proves convergence under various assumptions and shows empirical competitiveness against tuned baselines.
We introduce a new adaptive step-size strategy for convex optimization with stochastic gradient that exploits the local geometry of the objective function only by means of a first-order stochastic oracle and without any hyper-parameter tuning. The method comes from a theoretically-grounded adaptation of the Adaptive Gradient Descent Without Descent method to the stochastic setting. We prove the convergence of stochastic gradient descent with our step-size under various assumptions, and we show that it empirically competes against tuned baselines.