OCLGMLMay 1, 2018

Direct Runge-Kutta Discretization Achieves Acceleration

arXiv:1805.00521v5112 citations
Originality Incremental advance
AI Analysis

This provides an incremental improvement in optimization methods for machine learning by enabling faster convergence with standard numerical integrators.

The paper tackles the problem of accelerating gradient-based optimization by directly discretizing a second-order ODE related to Nesterov's method, achieving convergence rates of O(N^{-2s/(s+1)}) for smooth functions and even faster rates under a local flatness condition.

We study gradient-based optimization methods obtained by directly discretizing a second-order ordinary differential equation (ODE) related to the continuous limit of Nesterov's accelerated gradient method. When the function is smooth enough, we show that acceleration can be achieved by a stable discretization of this ODE using standard Runge-Kutta integrators. Specifically, we prove that under Lipschitz-gradient, convexity and order-$(s+2)$ differentiability assumptions, the sequence of iterates generated by discretizing the proposed second-order ODE converges to the optimal solution at a rate of $\mathcal{O}({N^{-2\frac{s}{s+1}}})$, where $s$ is the order of the Runge-Kutta numerical integrator. Furthermore, we introduce a new local flatness condition on the objective, under which rates even faster than $\mathcal{O}(N^{-2})$ can be achieved with low-order integrators and only gradient information. Notably, this flatness condition is satisfied by several standard loss functions used in machine learning. We provide numerical experiments that verify the theoretical rates predicted by our results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes