Some convergent results for Backtracking Gradient Descent method on Banach spaces
This provides theoretical guarantees for optimization in infinite-dimensional spaces, but it is incremental as it extends known results to Banach spaces with weak topology.
The paper tackles the convergence of Backtracking Gradient Descent on Banach spaces, proving that under Condition C and bounded Hessian, cluster points are critical points, sequences either diverge or have vanishing steps, and generic choices avoid saddle points in weak convergence.
Our main result concerns the following condition: {\bf Condition C.} Let $X$ be a Banach space. A $C^1$ function $f:X\rightarrow \mathbb{R}$ satisfies Condition C if whenever $\{x_n\}$ weakly converges to $x$ and $\lim _{n\rightarrow\infty}||\nabla f(x_n)||=0$, then $\nabla f(x)=0$. We assume that there is given a canonical isomorphism between $X$ and its dual $X^*$, for example when $X$ is a Hilbert space. {\bf Theorem.} Let $X$ be a reflexive, complete Banach space and $f:X\rightarrow \mathbb{R}$ be a $C^2$ function which satisfies Condition C. Moreover, we assume that for every bounded set $S\subset X$, then $\sup _{x\in S}||\nabla ^2f(x)||<\infty$. We choose a random point $x_0\in X$ and construct by the Local Backtracking GD procedure (which depends on $3$ hyper-parameters $α,β,δ_0$, see later for details) the sequence $x_{n+1}=x_n-δ(x_n)\nabla f(x_n)$. Then we have: 1) Every cluster point of $\{x_n\}$, in the {\bf weak} topology, is a critical point of $f$. 2) Either $\lim _{n\rightarrow\infty}f(x_n)=-\infty$ or $\lim _{n\rightarrow\infty}||x_{n+1}-x_n||=0$. 3) Here we work with the weak topology. Let $\mathcal{C}$ be the set of critical points of $f$. Assume that $\mathcal{C}$ has a bounded component $A$. Let $\mathcal{B}$ be the set of cluster points of $\{x_n\}$. If $\mathcal{B}\cap A\not= \emptyset$, then $\mathcal{B}\subset A$ and $\mathcal{B}$ is connected. 4) Assume that $X$ is separable. Then for generic choices of $α,β,δ_0$ and the initial point $x_0$, if the sequence $\{x_n\}$ converges - in the {\bf weak} topology, then the limit point cannot be a saddle point.