OCLGDSMLMar 11, 2020

Coordinate-wise Armijo's condition: General case

arXiv:2003.05252v11 citations
AI Analysis

This work addresses a computational bottleneck in optimization methods for researchers and practitioners, but it is incremental as it extends previous work on coordinate-wise conditions to general functions.

The paper tackles the problem of developing a systematic algorithm for obtaining step sizes satisfying the coordinate-wise Armijo's condition in optimization, and proves convergence results with experimental validation on functions like Rosenbrock's function.

Let $z=(x,y)$ be coordinates for the product space $\mathbb{R}^{m_1}\times \mathbb{R}^{m_2}$. Let $f:\mathbb{R}^{m_1}\times \mathbb{R}^{m_2}\rightarrow \mathbb{R}$ be a $C^1$ function, and $\nabla f=(\partial _xf,\partial _yf)$ its gradient. Fix $0<α<1$. For a point $(x,y) \in \mathbb{R}^{m_1}\times \mathbb{R}^{m_2}$, a number $δ>0$ satisfies Armijo's condition at $(x,y)$ if the following inequality holds: \begin{eqnarray*} f(x-δ\partial _xf,y-δ\partial _yf)-f(x,y)\leq -αδ(||\partial _xf||^2+||\partial _yf||^2). \end{eqnarray*} In one previous paper, we proposed the following {\bf coordinate-wise} Armijo's condition. Fix again $0<α<1$. A pair of positive numbers $δ_1,δ_2>0$ satisfies the coordinate-wise variant of Armijo's condition at $(x,y)$ if the following inequality holds: \begin{eqnarray*} [f(x-δ_1\partial _xf(x,y), y-δ_2\partial _y f(x,y))]-[f(x,y)]\leq -α(δ_1||\partial _xf(x,y)||^2+δ_2||\partial _yf(x,y)||^2). \end{eqnarray*} Previously we applied this condition for functions of the form $f(x,y)=f(x)+g(y)$, and proved various convergent results for them. For a general function, it is crucial - for being able to do real computations - to have a systematic algorithm for obtaining $δ_1$ and $δ_2$ satisfying the coordinate-wise version of Armijo's condition, much like Backtracking for the usual Armijo's condition. In this paper we propose such an algorithm, and prove according convergent results. We then analyse and present experimental results for some functions such as $f(x,y)=a|x|+y$ (given by Asl and Overton in connection to Wolfe's method), $f(x,y)=x^3 sin (1/x) + y^3 sin(1/y)$ and Rosenbrock's function.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes