OCLGDSNAJul 7, 2020

Asymptotic behaviour of learning rates in Armijo's condition

arXiv:2007.03618v11 citations
Originality Incremental advance
AI Analysis

This provides theoretical insights into gradient descent convergence for optimization researchers, but it is incremental as it builds on prior work on Backtracking GD.

The paper analyzes the asymptotic behavior of learning rates in Armijo's condition for gradient descent, proving that if a sequence converges to a non-degenerate critical point, the learning rates are bounded and quantifiable in terms of Hessian norms, and it shows that Backtracking GD has correct units according to a definition by Zeiler.

Fix a constant $0<α<1$. For a $C^1$ function $f:\mathbb{R}^k\rightarrow \mathbb{R}$, a point $x$ and a positive number $δ>0$, we say that Armijo's condition is satisfied if $f(x-δ\nabla f(x))-f(x)\leq -αδ||\nabla f(x)||^2$. It is a basis for the well known Backtracking Gradient Descent (Backtracking GD) algorithm. Consider a sequence $\{x_n\}$ defined by $x_{n+1}=x_n-δ_n\nabla f(x_n)$, for positive numbers $δ_n$ for which Armijo's condition is satisfied. We show that if $\{x_n\}$ converges to a non-degenerate critical point, then $\{δ_n\}$ must be bounded. Moreover this boundedness can be quantified in terms of the norms of the Hessian $\nabla ^2f$ and its inverse at the limit point. This complements the first author's results on Unbounded Backtracking GD, and shows that in case of convergence to a non-degenerate critical point the behaviour of Unbounded Backtracking GD is not too different from that of usual Backtracking GD. On the other hand, in case of convergence to a degenerate critical point the behaviours can be very much different. We run some experiments to illustrate that both scenrios can really happen. In another part of the paper, we argue that Backtracking GD has the correct unit (according to a definition by Zeiler in his Adadelta's paper). The main point is that since learning rate in Backtracking GD is bound by Armijo's condition, it is not unitless.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes