OCLGNov 12, 2019

Shadowing Properties of Optimization Algorithms

arXiv:1911.05206v120 citations
Originality Incremental advance
AI Analysis

This work provides a theoretical foundation for using continuous-time methods in optimization, which could benefit researchers and practitioners in machine learning and numerical analysis.

The paper addresses the issue of exponential error growth between optimization algorithms and their ODE approximations by showing that, under additional regularity assumptions, Gradient Descent and Heavy-ball methods exhibit shadowing properties, allowing small perturbations to mitigate this problem.

Ordinary differential equation (ODE) models of gradient-based optimization methods can provide insights into the dynamics of learning and inspire the design of new algorithms. Unfortunately, this thought-provoking perspective is weakened by the fact that, in the worst case, the error between the algorithm steps and its ODE approximation grows exponentially with the number of iterations. In an attempt to encourage the use of continuous-time methods in optimization, we show that, if some additional regularity on the objective is assumed, the ODE representations of Gradient Descent and Heavy-ball do not suffer from the aforementioned problem, once we allow for a small perturbation on the algorithm initial condition. In the dynamical systems literature, this phenomenon is called shadowing. Our analysis relies on the concept of hyperbolicity, as well as on tools from numerical analysis.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes