OCLGMLJan 7, 2020

Backtracking Gradient Descent allowing unbounded learning rates

arXiv:2001.02005v24 citations
Originality Incremental advance
AI Analysis

This is an incremental theoretical improvement for optimization in machine learning, addressing limitations in learning rate constraints.

The paper tackles the problem of convergence in Gradient Descent by allowing unbounded learning rates, proving convergence under general assumptions and showing that a specific growth rate is optimal for sequence convergence.

In unconstrained optimisation on an Euclidean space, to prove convergence in Gradient Descent processes (GD) $x_{n+1}=x_n-δ_n \nabla f(x_n)$ it usually is required that the learning rates $δ_n$'s are bounded: $δ_n\leq δ$ for some positive $δ$. Under this assumption, if the sequence $x_n$ converges to a critical point $z$, then with large values of $n$ the update will be small because $||x_{n+1}-x_n||\lesssim ||\nabla f(x_n)||$. This may also force the sequence to converge to a bad minimum. If we can allow, at least theoretically, that the learning rates $δ_n$'s are not bounded, then we may have better convergence to better minima. A previous joint paper by the author showed convergence for the usual version of Backtracking GD under very general assumptions on the cost function $f$. In this paper, we allow the learning rates $δ_n$ to be unbounded, in the sense that there is a function $h:(0,\infty)\rightarrow (0,\infty )$ such that $\lim _{t\rightarrow 0}th(t)=0$ and $δ_n\lesssim \max \{h(x_n),δ\}$ satisfies Armijo's condition for all $n$, and prove convergence under the same assumptions as in the mentioned paper. It will be shown that this growth rate of $h$ is best possible if one wants convergence of the sequence $\{x_n\}$. A specific way for choosing $δ_n$ in a discrete way connects to Two-way Backtracking GD defined in the mentioned paper. We provide some results which either improve or are implicitly contained in those in the mentioned paper and another recent paper on avoidance of saddle points.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes